Issues of reproducing Table 1 results on Commonsense Conversation Dataset (CCD)

Yuanhy1997 / SeqDiffuSeq

Text Diffusion Model with Encoder-Decoder Transformers for Sequence-to-Sequence Generation [NAACL 2024]

82 stars 13 forks source link

Hi, I try to use your script (ccd.sh) to reproduce the Table 1 results on Commonsense Conversation Dataset, but it turns out that my reproduced results (BLEU: 0.154, Rouge-L: 6.38) are far below your reported values (BLEU: 1.02, Rouge-L: 8.59). Could you check whether the hyperparameters in ccd.sh are the optimal ones that you use? It would be better if you could also provide the evaluation scripts for producing BLEU and Rouge-L (currently the inference_scripts only save the testing outputs but no metrics evaluation results if I run it right)? Besides, are there any model checkpoints and testing outputs available?

Yuanhy1997 / SeqDiffuSeq

Issues of reproducing Table 1 results on Commonsense Conversation Dataset (CCD) #13