allenai / RL4LMs

A modular RL library to fine-tune language models to human preferences
https://rl4lms.apps.allenai.org/
Apache License 2.0
2.13k stars 191 forks source link

Reproducing existing results on NarrativeQA #62

Open yxk23 opened 1 year ago

yxk23 commented 1 year ago

I'm trying to reproduce the results for NarrativeQA by directly running the command with the .yml configuration files. Below are the performances measured with ROUGE-L-Max. For PPO with supervision, I got 0.581 and 0.588 for epochs 0 and 99, respectively. For NLPO with supervision, I got 0.217 and 0.213 for epochs 0 and 99, respectively.

I'm wondering why the result for NLPO doesn't match the reported result in the paper.

I also tried to use the config for PPO, and just modify the RL algorithm to NLPO, I got the same result as above.

Please let me know if I'm missing something or if it's some other issue. Thanks!