GanjinZero / RRHF

[NIPS2023] RRHF & Wombat
780 stars 49 forks source link

PPO implementation #19

Open yuzc19 opened 1 year ago

yuzc19 commented 1 year ago

Could you provide the PPO codebase that can reproduce the results of the paper? I have not found it in this repo. Thank you!

GanjinZero commented 1 year ago

We are still arranging our PPO codebase. We will provide some details here.

Yuanhy1997 commented 1 year ago

I will add releasing the codebase for PPO to the TODO list. For the experiments reported in our paper, we base the code on trlX's implementation.