alexa / dialoglue

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview
Apache License 2.0
279 stars 25 forks source link

hyper parameters for MultiWOZ #13

Open YinpeiDai opened 3 years ago

YinpeiDai commented 3 years ago

Hi ! For all datasets in dialoGLUE benchmark, I can reproduce similar results except for the MultiWOZ. For ConverBERT-DG, your joint goal is around 58, but I can only get 56, which is the same as the original Trippy reported. I wonder if you have used different hyper-parameters for Trippy? If so, can you share them ?

Thank you!

The original hypers for Trippy are as follows:

--do_lower_case \ --learning_rate=1e-4 \ --num_train_epochs=10 \ --max_seq_length=180 \ --per_gpu_train_batch_size=48 \ --per_gpu_eval_batch_size=1 \ --output_dir=${OUT_DIR} \ --save_epochs=2 \ --logging_steps=10 \ --warmup_proportion=0.1 \ --eval_all_checkpoints \ --adam_epsilon=1e-6 \ --label_value_repetitions \ --swap_utterances \ --append_history \ --use_history_labels \ --delexicalize_sys_utts \ --class_aux_feats_inform \ --class_aux_feats_ds \

YinpeiDai commented 3 years ago

Are the hyper-parameters you use in the dump_outputs.py and dump_outputs_fewshot.py?

nlpist commented 3 years ago

Hey @YinpeiDai , you've mentioned that you succeeded in reproducing results for all tasks expect for MultiWOZ.

I am trying to reproduce results for slot tasks with the default script from the repository, however with no success. I wonder if your script for slot is different from one from the repository?

YinpeiDai commented 3 years ago

@zabh0z no,I use the same script.

ggaemo commented 3 years ago

How much JGA have you achieved?

Shikib commented 3 years ago

Apologies for the long delay in addressing this issue. Our hyperparameters are in this script: https://github.com/alexa/dialoglue/blob/master/trippy/DO.example.advanced

Our 58 result is only achieved with --mlm_pre and --mlm_during.