Reproducing few-shot experiments on Multiwoz2.1

alexa / dialoglue

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

Apache License 2.0

279 stars 25 forks source link

Hi,

I am working on few-shot experiments on MultiWOZ2.1. However, I faced the same problem as in #7 .

BERT + pre + multi trained on few-shot dataset achieved ~0.49 JGA on the test set (with random seed 42).

I modified a small part of your codes, and the diff is listed here (GitHub comparing changes). I ran the experiment directly with DO.example.advanced.

Environment

GPU: RTX 3090
PyTorch: 1.7.0+cu110

I wonder if my training / evaluation process were wrong and got the high performance even in the few-shot setting.

Thanks for your reply in advance!

alexa / dialoglue

Reproducing few-shot experiments on Multiwoz2.1 #14