Chia-Hsuan-Lee / DST-as-Prompting

Source code for Dialogue State Tracking with a Language Model using Schema-Driven Prompting
61 stars 12 forks source link

On reproducing the experiment results in paper #9

Open nxpeng9235 opened 1 year ago

nxpeng9235 commented 1 year ago

Hi,

Congrats on being accepted in EMNLP 2021 as a concise and solid work! I am currently following your research and trying to reproduce the experimental results in the original paper using your codes. However, I have met some trouble in aligning the same JGA scores.

My experiments were all on MultiWOZ v2.2, with domain and slot descriptions. Here are my hyperparameter settings and corresponding results.

I am wondering if there is some other tricks to achieve a better results. If so, is it okay to share? So much appreciated! Looking forward to your reply :-D

Best

Chia-Hsuan-Lee commented 1 year ago

Hi, thanks for your interest! My best guess will be this is an optimization difference between training with "multiple machines" and "accumulating gradients within a single machine". For the T5-base, we used multi-GPUs and I honestly can't remember the exact configs we used.