declare-lab / dialogue-understanding

This repository contains PyTorch implementation for the baseline models from the paper Utterance-level Dialogue Understanding: An Empirical Study
MIT License
124 stars 21 forks source link

Performance for DailyDialog #6

Closed so-hyeun closed 3 years ago

so-hyeun commented 3 years ago

Hi. I have a question for reproducing performance for dailydialog.

performance_dailydialog

1) In the photo, @Best Valid F1 values ​​are Test F1 values ​​when validation F1 is the highest?

2) I train the model with batch size =1 due to the computing power problem,. Can this be the cause of the difference between the performance of the paper (59.50) and my performance (57.5)?

deepanwayx commented 3 years ago
  1. Yes, the scores are test F1 values at highest validation F1 values.

  2. Yes exactly. If you have a look at the Section 5.1 and Figure 12 in our paper, you would notice that the performance is heavily dependent on the batch size. In DailyDialog, smaller batch size results in poorer performance. I would also suggest that you run each experiment several times and take the average of the results to obtain results closer to ours.

so-hyeun commented 3 years ago

Thanks for the kind and quick reply.