ELMO-QG baseline got 13.9 BLEU-4

jemmryx commented 4 years ago

Hi, I totally follow your step and train the ELMO-QG baseline model but I only got 13.9 BLEU-4 scores. It seems that something was wrong somewhere ....

ZhangShiyue commented 4 years ago

Ops, I'm not sure what's the issue here... I'll help to fix this... Did you test the pre-trained model? If you did not, please try it first; if you did, does it give a good BLEU? I'll recheck the process myself.

jemmryx commented 4 years ago

Ops, I'm not sure what's the issue here... I'll help to fix this... Did you test the pre-trained model? If you did not, please try it first; if you did, does it give a good BLEU? I'll recheck the process myself.

Directly using the pre-trained model has good results but training from scratch not.

ZhangShiyue commented 4 years ago

Hi, I refollowed the readme to train a model from scratch. I found a small bug that might cause your problem. The 'best_bleu' didn't update along with checkpoint, I mistakenly removed it when cleaning the code, so the final best checkpoint saved in "best.json" is actually not the best one on dev set but the final checkpoint. I updated the code, see this commit: https://github.com/ZhangShiyue/QGforQA/commit/81beed7122abbf9a62745af8ab7d6d4d4bf52c73 Besides, you could also check the tensorboard and manually choose the best checkpoint on dev to test. The best bleu on dev set is about 17.5+ (beam_size=1). I got similar results as this in different runs. Plus, please set beam_size=10 when testing.

ZhangShiyue / QGforQA

ELMO-QG baseline got 13.9 BLEU-4 #9