gmftbyGMFTBY / MultiTurnDialogZoo

Multi-turn dialogue baselines written in PyTorch
MIT License
162 stars 23 forks source link
attention baselines chatbot dailydialog dshred gat gcn hran hred hred-attn kgcvae multi-head recosa self-attention seq2seq transformer utterance-level-attention vhred word-level-attention wseq

Multi-turn Dialog Zoo

A batch of ready-to-use multi-turn or single-turn dialogue baselines.

Welcome PRs and issues.

TODO

Dataset

The preprocess script for these datasets can be found under data/data_process folder.

  1. DailyDialog dataset
  2. Ubuntu corpus
  3. EmpChat
  4. DSTC7-AVSD
  5. PersonaChat

Metric

  1. PPL: test perplexity
  2. BLEU(1-4): nlg-eval version or multi-bleu.perl or nltk
  3. ROUGE-2
  4. Embedding-based metrics: Average, Extrema, Greedy (slow and optional)
  5. Distinct-1/2
  6. BERTScore
  7. BERT-RUBER

Requirements

  1. Pytorch 1.2+ (Transformer support & pack_padded update)
  2. Python 3.6.1+
  3. tqdm
  4. numpy
  5. nltk 3.4+
  6. scipy
  7. sklearn (optional)
  8. rouge
  9. GoogleNews word2vec or glove 300 word2vec (optional)
  10. pytorch_geometric (PyG 1.2) (optional)
  11. cuda 9.2 (match with PyG) (optional)
  12. tensorboard (for PyTorch 1.2+)
  13. perl (for running the multi-bleu.perl script)

Dataset format

Three multi-turn open-domain dialogue dataset (Dailydialog, DSTC7_AVSD, PersonaChat) can be obtained by this link

Each dataset contains 6 files

In all the files, one line contain only one dialogue context (src) or the dialogue response (tgt). More details can be found in the example files. In order to create the graph, each sentence must begin with the special tokens <user0> and <user1> which denote the speaker. The __eou__ is used to separate the multiple sentences in the conversation context. More details can be found in the small data case.

How to use

0. Ready

Before running the following commands, make sure the essential folders are created:

mkdir -p processed/$DATASET
mkdir -p data/$DATASET
mkdir -p tblogs/$DATASET
mkdir -p ckpt/$DATASET

Variable DATASET contains the name of the dataset that you want to process

1. Generate the vocab of the dataset

# default 25000 words
./run.sh vocab <dataset>

2. Generate the graph of the dataset (optional)

# only MTGCN and GatedGCN need to create the graph
# zh or en
./run.sh graph <dataset> <zh/en> <cuda>

3. Check the information about the preprocessed dataset

Show the length of the utterances, turns of the multi-turn setting and so on.

./run.sh stat <dataset>

4. Train N-gram LM (Discard)

Train the N-gram Language Model by NLTK (Lidstone with 0.5 gamma, default n-gram is 3):

# train the N-gram Language model by NLTK
./run.sh lm <dataset>

5. Train the model on corresponding dataset

./run.sh train <dataset> <model> <cuda>

6. Translate the test dataset:

# translate mode, dataset dialydialog, model HRED on 4th GPU
./run.sh translate <dataset> <model> <cuda>

Translate a batch of models

# rewrite the models and datasets you want to translate
./run_batch_translate.sh <cuda>

7. Evaluate the result of the translated utterances

# get the BLEU and Distinct result of the generated sentences on 4th GPU (BERTScore need it)
./run.sh eval <dataset> <model> <cuda>

Evaluate a batch of models

# the performance are redirected into the file `./processed/<dataset>/<model>/final_result.txt`
./run_batch_eval.sh <cuda>

8. Get the curve of all the training checkpoints (discard, tensorboard is all you need)

# draw the performance curve, but actually, you can get all the information from the tensorboard
./run.sh curve <dataset> <model> <cuda>

9. Perturbate the source test dataset

Refer to the paper: Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study

# 10 mode for perturbation
./run.sh perturbation <dataset> <zh/en>

Ready-to-use Models

FAQ