Reproducing existing results on NarrativeQA

I'm trying to reproduce the results for NarrativeQA by directly running the command with the .yml configuration files. Below are the performances measured with ROUGE-L-Max. For PPO with supervision, I got 0.581 and 0.588 for epochs 0 and 99, respectively. For NLPO with supervision, I got 0.217 and 0.213 for epochs 0 and 99, respectively.

I'm wondering why the result for NLPO doesn't match the reported result in the paper.

I also tried to use the config for PPO, and just modify the RL algorithm to NLPO, I got the same result as above.

Please let me know if I'm missing something or if it's some other issue. Thanks!

allenai / RL4LMs

Reproducing existing results on NarrativeQA #62