High Self BLEU score with anlg dataset

Hello,

I ran the KGMixtureOfExpertShen model with anlg dataset and I'm getting quite high self-bleu-3 and self-bleu-4 scores than the paper, while other metrics (distinct2, entropy4, topk-bleu4 and topk-rouge-l) produce similar scores as in the paper. For the Shen model, I've added --weight_decay 0.01 --warmup_steps 10000 to reproduce the performance, but is there anything I need to change in hyperparameters?

The output_pred_metric result is as follows: { "epoch": "test_metric", "topk_bleu_4": 0.14420743386330465, "topk_rouge_l": 0.3857134336425845, "self_bleu_3": 0.330773404929831, "self_bleu_4": 0.2793998961185776, "entropy_4": 10.783610820453294, "distinct_2": 0.38942323619377806, }

DM2-ND / MoKGE

High Self BLEU score with anlg dataset #9