I'm trying to reproduce the results for NarrativeQA by directly running the command with the .yml configuration files. Below are the performances measured with ROUGE-L-Max.
For PPO with supervision, I got 0.581 and 0.588 for epochs 0 and 99, respectively.
For NLPO with supervision, I got 0.217 and 0.213 for epochs 0 and 99, respectively.
I'm wondering why the result for NLPO doesn't match the reported result in the paper.
I also tried to use the config for PPO, and just modify the RL algorithm to NLPO, I got the same result as above.
Please let me know if I'm missing something or if it's some other issue. Thanks!
I'm trying to reproduce the results for NarrativeQA by directly running the command with the .yml configuration files. Below are the performances measured with ROUGE-L-Max. For PPO with supervision, I got 0.581 and 0.588 for epochs 0 and 99, respectively. For NLPO with supervision, I got 0.217 and 0.213 for epochs 0 and 99, respectively.
I'm wondering why the result for NLPO doesn't match the reported result in the paper.
I also tried to use the config for PPO, and just modify the RL algorithm to NLPO, I got the same result as above.
Please let me know if I'm missing something or if it's some other issue. Thanks!