Closed ddehun closed 3 years ago
In my experience, the RUBER and ADEM models are sort of unstable to train.
1) Make sure you have initialized the model with HRED pretrained on response generation task.
2) I would suggest you try different random seeds (e.g. with argument "--seed 42") as it gave me really different results.
3) BTW, I have been using floor(speaker) encoder all the time (with argument "--floor_encoder rel"), so I don't know if that is a factor.
Hi, @ZHAOTING !
I tried to reproduce the performance of RUBER-unref model in your ACL 2020 paper using your dataset, but I failed.
More specifically, I hope to reproduce the RUBER-ref model by excluding the ground-truth in the right part of TABLE 1, which shows about .43 Pearson and .39 Spearman correlation score.
I struggled with either of my custom implementations and using this repository, but none of them shows a similar performance.
When I train RUBER in this code, the best performance is about 0.21 Pearson and 0.25 spearman correlation. And when the epoch is 8, the learning rate converged into 1e-7 (stop condition).
I used the below command line for training. The hyperparameters are the same as the value in the Appendix of the paper.
Could you give me some tips to improve the performance of RUBER-unref model? Even when I replace the [word embedding + GRU] into [BERT-freeze and mean pooling], followed by this paper, the best correlation is only about 0.2.
Thanks!