ChestnutWYN / ACL2021-Novel-Slot-Detection

18 stars 5 forks source link

Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System

This repository is the official implementation of Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System.(ACL2021) by Yanan Wu, [Zhiyuan Zeng](), Keqing He, Hong Xu, Yuanmeng Yan, Huixing Jiang, Weiran Xu.

Introduction

The Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System.

An example of Novel Slot Detection in thetask-oriented dialogue system:

The architecture of the proposed model:

Dependencies

We use anaconda to create python environment:

conda create --name python=3.6

Install all required libraries:

pip install -r requirements.txt

How to run

1. Train (only):

python --mode train --dataset SnipsNSD5% --threshold 8.0 --output_dir ./output --batch_size 256 --cuda 1 

2. Predict (only):

python --mode test --dataset SnipsNSD5% --threshold 8.0 --output_dir ./output --batch_size 256 --cuda 1 

1. Train and predict (Both):

python --mode both --dataset SnipsNSD5% --threshold 8.0 --output_dir ./output --batch_size 256 --cuda 1 

Parameters

  1. IND and NSD results with different proportions (5%, 15% and 30%) of classes are treated as unknown slots on Snips-NSD. * indicates the significant improvement over all baselines (p < 0.05).
5% 15% 30%
Models IND NSD IND NSD IND NSD
detection method objective distance strategy Span F1 Span F1 Token F1 Span F1 Span F1 Token F1 Span F1 Span F1 Token F1
MSP binary - 87.21 12.34 25.16 71.44 12.31 39.50 58.88 8.73 40.38
MSP multiple - 88.05 14.04 30.50 79.71 20.97 40.02 78.52 25.26 46.91
MSP binary+multiple - 89.59 23.58 37.55 83.72 24.70 45.32 79.08 30.66 52.10
GDA binary difference 87.95 23.83 35.83 83.65 22.06 43.99 78.72 32.50 44.13
GDA binary minumum 61.29 10.36 17.08 49.11 16.91 31.10 48.07 15.56 33.78
GDA multiple difference 93.14 29.73 45.99 90.07 31.96 53.02 85.56 36.16 54.55
GDA multiple minumum 93.10 31.67* 46.97* 90.18 32.19 53.75* 86.26* 38.64* 55.24*
  1. IND and NSD results with different proportions (5%, 15% and 30%) of classes are treated as unknown slots on ATIS-NSD. * indicates the significant improvement over all baselines (p < 0.05).
5% 15% 30%
Models IND NSD IND NSD IND NSD
detection method objective distance strategy Span F1 Span F1 Token F1 Span F1 Span F1 Token F1 Span F1 Span F1 Token F1
MSP binary - 92.04 19.73 29.63 91.74 23.40 33.89 80.49 21.88 39.17
MSP multiple - 94.33 27.15 31.16 92.54 39.88 42.29 87.63 40.42 47.64
MSP binary+multiple - 94.41 32.49 43.48 93.29 41.23 43.13 90.14 41.76 51.87
GDA binary difference 93.69 27.02 34.21 92.13 30.51 36.30 88.73 30.91 45.64
GDA binary minumum 93.57 15.90 20.96 90.98 24.53 27.26 88.21 26.40 39.83
GDA multiple difference 95.20 47.78* 51.54* 93.92 50.92* 52.24* 92.02 51.26* 56.59*
GDA multiple minumum 95.31* 41.74 45.91 93.88 43.78 46.18 91.67 45.44 52.37

Citation

@article{Wu2021NovelSD,
  title={Novel Slot Detection: A Benchmark for Discovering Unknown Slot Types in the Task-Oriented Dialogue System},
  author={Yanan Wu and Zhiyuan Zeng and Keqing He and Hong Xu and Yuanmeng Yan and Huixing Jiang and Weiran Xu},
  journal={ArXiv},
  year={2021},
  volume={abs/2105.14313}
}

Issue

Q:There are two training objectives mentioned in Section 4.1: multiple classifier and binary classifier. But if we use binary classifier, how can we get the ind category? And how to get the results of MSP + binary and GDA + binary?

A:As we mention in Section4.1—— "In the test stage, for in-domain prediction, we both use the multiple classifier. While, for novel slot detection, we use the multiple classifier or the binary classifier, or both of them". It means binary classifier won't be used for gaining the fine in-domain labels, but for detecting whether a token is a novel slot, and if yes, we will override the fine in-domain labels gained by multiple classifier.