alexa / dialoglue

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue
https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview
Apache License 2.0
279 stars 25 forks source link

Reproducing few-shot experiments on Multiwoz2.1 #14

Open jimpei8989 opened 3 years ago

jimpei8989 commented 3 years ago

Hi,

I am working on few-shot experiments on MultiWOZ2.1. However, I faced the same problem as in #7 .

BERT + pre + multi trained on few-shot dataset achieved ~0.49 JGA on the test set (with random seed 42).

I modified a small part of your codes, and the diff is listed here (GitHub comparing changes). I ran the experiment directly with DO.example.advanced.

Environment

I wonder if my training / evaluation process were wrong and got the high performance even in the few-shot setting.

Thanks for your reply in advance!

Shikib commented 3 years ago

Apologies for the delay in addressing this issue. I don't fully understand your issue, are you saying that you're achieving higher performance than JGA of 0.49 using our few-shot setup?

I don't see any problems in your diff. One way to assert that there's no errors is to ensure that there is no data leakage in the MLM-pre and MLM-multi steps (i.e., that MLM in the few-shot case is only done on the few-shot data).