Reproduce the performance of RUBER-unref in paper

Hi, @ZHAOTING !

I tried to reproduce the performance of RUBER-unref model in your ACL 2020 paper using your dataset, but I failed.

More specifically, I hope to reproduce the RUBER-ref model by excluding the ground-truth in the right part of TABLE 1, which shows about .43 Pearson and .39 Spearman correlation score.

I struggled with either of my custom implementations and using this repository, but none of them shows a similar performance.

When I train RUBER in this code, the best performance is about 0.21 Pearson and 0.25 spearman correlation. And when the epoch is 8, the learning rate converged into 1e-7 (stop condition).

I used the below command line for training. The hyperparameters are the same as the value in the Appendix of the paper.

python -m tasks.response_eval.train_unsupervised --model ruber --corpus dd --tokenizer ws --enable_log True --save_model True --batch_size 30 --init_lr 0.0001 --n_epochs 30

Could you give me some tips to improve the performance of RUBER-unref model? Even when I replace the [word embedding + GRU] into [BERT-freeze and mean pooling], followed by this paper, the best correlation is only about 0.2.

Thanks!

ZHAOTING / dialog-processing

Reproduce the performance of RUBER-unref in paper #4