RLHFlow / Online-RLHF

A recipe for online RLHF and online iterative DPO.
https://rlhflow.github.io/
439 stars 49 forks source link

About the results of vanilla DPO #28

Open lucasliunju opened 1 week ago

lucasliunju commented 1 week ago

Hi, Thanks for your great work.

I would like to run vanilla DPO (offline DPO) as the baseline to compare its performance with online DPO. May I ask whether I can use this codebase to tun the experiment and what is the running command. Thank you very much in advance.

WeiXiongUST commented 1 week ago

Hi, if you want to generate the data by yourself, all you need to do is to change the line 80 of https://github.com/RLHFlow/Online-RLHF/blob/main/run_loop2.sh so that you only run for 1 iteration.

If you already have the dataset, you can run by conda activate rlhflow accelerate launch --config_file ./configs/zero2.yaml dpo_iteration/run_dpo.py ./configs/training.yaml

But you may want to update the def prepare_data function in run_dpo.py to use your own data.