HLTCHKUST / BiToD

A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling
Apache License 2.0
23 stars 1 forks source link
benchmark bilingual chinese dataset dialogue-system english

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling

This repository includes the dataset and baselines of the paper:

BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling (Accepted in NeurIPS 2021 Track on Datasets and Benchmarks) [PDF].

Authors: Zhaojiang Lin, Andrea Madotto, Genta Indra Winata, Peng Xu, Feijun Jiang, Yuxiang Hu, Chen Shi, Pascale Fung

Abstract:

Task-oriented dialogue (ToD) benchmarks provide an important avenue to measure progress and develop better conversational agents. However, existing datasets for end-to-end ToD modelling are limited to a single language, hindering the development of robust end-to-end ToD systems for multilingual countries and regions. Here we introduce BiToD, the first bilingual multi-domain dataset for end-to-end task-oriented dialogue modeling. BiToD contains over 7k multi-domain dialogues (144k utterances) with a large and realistic parallel knowledge base. It serves as an effective benchmark for evaluating bilingual ToD systems and cross-lingual transfer learning approaches. We provide state-of-the-art baselines under three evaluation settings (monolingual, bilingual and cross-lingual). The analysis of our baselines in different settings highlights 1) the effectiveness of training a bilingual ToD system comparing to two independent monolingual ToD systems, and 2) the potential of leveraging a bilingual knowledge base and cross-lingual transfer learning to improve the system performance in the low resource condition.

Leaderboard

Monolingual English (EN) Chinese (ZH)
TSR DSR API_ACC BLEU JGA TSR DSR API_ACC BLEU JGA
MinTL(mBART) 56 33.71 57.03 35.34 67.36 56.82 29.35 71.89 20.06 72.18
MinTL(mT5) 69.13 47.51 67.92 38.48 69.19 53.77 31.09 63.25 19.03 67.35
Bi-lingual English (EN) Chinese (ZH)
TSR DSR API_ACC BLEU JGA TSR DSR API_ACC BLEU JGA
MinTL(mBART) 42.45 17.87 65.35 28.76 69.37 40.39 16.96 65.37 5.23 69.5
MinTL(mT5) 71.18 51.13 71.87 40.71 72.16 57.24 34.78 65.54 22.45 68.7
Cross-lingual ZH→EN (10\%) EN→ZH (10\%)
TSR DSR API_ACC BLEU JGA TSR DSR API_ACC BLEU JGA
MinTL(mBART) 1.11 0.23 0.6 3.17 4.64 0 0 0 0.01 2.14
+CPT 36.19 16.06 41.51 22.5 42.84 24.64 11.96 29.04 8.29 28.57
+MLT 33.62 11.99 41.08 20.01 55.39 44.71 21.96 54.87 14.19 60.71
MinTL(mT5) 6.78 1.36 17.75 10.35 19.86 4.16 2.2 6.67 3.3 12.63
+CPT 44.94 24.66 47.6 29.53 48.77 43.27 23.7 49.7 13.89 51.4
+MLT 56.78 33.71 56.78 32.43 58.31 49.2 27.17 50.55 14.44 55.05

Dataset

Training, validation and test data are avalible in data folder. We also provide the data split for cross-lingual few shot setting.

{
    dialogue_id:{
        "Scenario": {
            "WizardCapabilities": [
            ],
            "User_Goal": {
            }
        }
        "Events":{
            {
                "Agent": "User",
                "Actions": [
                    {
                        "act": "inform_intent",
                        "slot": "intent",
                        "relation": "equal_to",
                        "value": [
                        "restaurants_en_US_search"
                        ]
                    }
                ],
                "active_intent": "restaurants_en_US_search",
                "state": {
                "restaurants_en_US_search": {}
                },
                "Text": "Hi, I'd like to find a restaurant to eat",
            },
            {
                "Agent": "Wizard",
                "Actions": [
                    {
                        "act": "request",
                        "slot": "price_level",
                        "relation": "",
                        "value": []
                    }
                ],
                "Text": "Hi there. Would you like a cheap or expensive restaurant?",
                "PrimaryItem": null,
                "SecondaryItem": null,
            },
            ...
        }
    }
}

Experimental Setup

Dependency

Check the packages needed or simply run the command

❱❱❱ pip install -r requirements.txt

Setup MongoDB server

Install MongoDB server. Please check the documentation in this link.

Then, export the DB dump by running the following command

❱❱❱ cd ./db && bash restore.sh

Preprocessing

❱❱❱ python preprocess.py --setting zh

Baselines

Here we show one example for training and evaluation. Check run.sh to run all the baselines

mT5(zh)

Train

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch --nproc_per_node=2 train.py \
--model_name_or_path google/mt5-small \
--do_train \
--do_eval \
--train_file data/preprocessed/zh_train.json \
--validation_file data/preprocessed/zh_valid.json \
--learning_rate 5e-4  \
--num_train_epochs 8 \
--source_lang en_XX \
--target_lang en_XX \
--logging_steps 100 \
--save_steps 2000 \
--output_dir save/zh_mt5_5e-4 \
--per_device_train_batch_size=8 \
--per_device_eval_batch_size=8 \
--gradient_accumulation_steps 8 \
--overwrite_output_dir \
--predict_with_generate \
--fp16 \
--sharded_ddp zero_dp_3

Evaluate Model

❱❱❱ CUDA_VISIBLE_DEVICES=0 python evaluate.py --model_path save/zh_mt5_5e-4 --setting zh --reference_file_path data/zh_test.json --save_prefix t5_

Evaluate File

We also support evalute the prediction file:

❱❱❱ python evaluate.py --eval_mode eval_file --prediction_file_path result/zh_end2end_predictions.json --setting zh --reference_file_path data/zh_test.json

Citation:

The bibtex is listed below:

@article{lin2021bitod,
  title={BiToD: A Bilingual Multi-Domain Dataset For Task-Oriented Dialogue Modeling},
  author={Lin, Zhaojiang and Madotto, Andrea and Winata, Genta Indra and Xu, Peng and Jiang, Feijun and Hu, Yuxiang and Shi, Chen and Fung, Pascale},
  journal={arXiv preprint arXiv:2106.02787},
  year={2021}
}