I am trying to replicate the DiffuSeq model for the Paraphrase task with the QQP dataset. I kept everything to the default training config, and for MBR I ran with 20 different random seeds during evaluation, but I still can't match the performance reported in the paper in Table 1.
For reference, this is the content of training_args.json file:
Hi!
I am trying to replicate the DiffuSeq model for the Paraphrase task with the QQP dataset. I kept everything to the default training config, and for MBR I ran with 20 different random seeds during evaluation, but I still can't match the performance reported in the paper in Table 1.
For reference, this is the content of training_args.json file:
and this is the output of running evaluation
python eval_seq2seq.py --folder ../{your-path-to-outputs} --mbr
: