awslabs / pptod

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System (ACL 2022)
https://arxiv.org/abs/2109.14739
Apache License 2.0
158 stars 27 forks source link

About results of the E2E-TOD #1

Closed xiami2019 closed 3 years ago

xiami2019 commented 3 years ago

Hi, what a nice work! I have a little question about results of the E2E_TOD fine-tuning. I noticed that the released best result is obtained at Epoch 6. However, I trained 15 epoch and only get 92.06 combined score (at epoch 10). My batch size is 128 (number_of_gpu 4, batch_size_per_gpu 2, gradient_accumulation_steps 16), which is same as the release code. I wonder that is there any other settings for E2E_TOD fine-tuning to get best result in a few epochs.

Looking forward to your reply.

yxuansu commented 3 years ago

Hello, thank you for your question!

May I ask are you using PPTOD-small or PPTOD-base?

Here are a few suggestions:

  1. First using PPTOD-small to validate the results. (PPTOD-smaller and easier to train)
  2. You can also change line 203 of learn.py from 'dev' to 'test' to directly observe the results on test set. Otherwise, you could run inference.sh to see the model performance on the test set. (But for results validation, changing 'dev' to 'test' should be the easiest.)
  3. We observe randomness in the best epoch number. I suggest you to train PPTOD-small for at least 30 epochs to see the test set results. The results should be pretty easy to replicate.

Please let me know whether you can replicate the results. Looking forward to your updates!

xiami2019 commented 3 years ago

Thanks for your response, I initialized the model with PPTOD-base. I think I just record results from dev set. I will train for more epochs and update my results.