HWU64 odd number of samples

alexa / dialoglue

DialoGLUE: A Natural Language Understanding Benchmark for Task-Oriented Dialogue

https://evalai.cloudcv.org/web/challenges/challenge-page/708/overview

Apache License 2.0

279 stars 25 forks source link

HWU64 odd number of samples #17

Open jnehring opened 2 years ago

jnehring commented 2 years ago

The HWU64 dataset contains 25k samples according to the original paper. The DialoGLUE paper stats the same number of samples.

However, the Readme states 11k samples.

If I count the number of samples which are actually in the HWU64 part of DialoGLUE then I get 12,112 samples (12k).

My questions:

Is there a reason for the difference in numbers in the original HWU64 and in the DialoGLUE HWU64? Or is it a bug?
Did you compute the performance of the intent prediction models on 25k, 12k or 11k samples?

Thank you for your answers :)

Shikib commented 2 years ago

Thank you for bringing this discrepancy to our attention. We use the data downloading scripts provided by [https://arxiv.org/pdf/2009.13570.pdf] to get all of the intent prediction datasets, including HWU, in order to maintain consistency with prior work on intent prediction. The performance is reported on the data that is obtained by the data downloading scripts (i.e., the ~11k data points). The error here is the 25k number cited in the paper, which I will attempt to get corrected in the near future.

I'm not sure how you're counting 12,112 samples. I just ran the data downloading scripts and see:

1077 test.csv 9961 train.csv

jnehring commented 2 years ago

thank you for the quick answer :) I counted 12,112 by adding

9960 train.csv 1076 test.csv 1076 val.csv

so I counted the val file also. There is still a discrepancy of 1 sample but that's ok for me.