OpenLLMAI / OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & Mixtral)
https://openrlhf.readthedocs.io/
Apache License 2.0
1.73k stars 164 forks source link

Citation or comparison to trlX and NeMo-align. #221

Open LouisCastricato opened 4 months ago

LouisCastricato commented 4 months ago

Hi

I notice you cite "70B+ Full Tuning with 16 A100" however this is also something that trlX (and that we worked very hard to add ;) ) supports via NeMo support. Similarly, this is something that nvidia has implemented in their NeMo suite of tools.

Similarly, all of the libraries you compare to have implemented "PPO Implementation Tricks." trlX first did this in the fall of 2022, and I discussed this with the hugging face trl team shortly after (I believe they had it merged by January or February 2023).

This aside though, these tricks often don't actually really benefit RLHF as some of my friends (Costa Huang + others.) and I have discussed. If you have noticed contradicting evidence I would be incredibly interested in learning more.

hijkzzz commented 4 months ago

We will compare OpenRLHF with the existing framework in a formal technical report.

The biggest weaknesses of NeMo-based solutions are:

TRLX and TRL use the merged actor-critic to train the models, which conflicts with the standard process of RLHF. This is one of our motivations for developing OpenRLHF.

For PPO implementation tricks, this is a common practice in RL and has been around for a long time. We only used the necessary tricks, and indeed most of them were already implemented in TRLX.

THINK2TRY commented 4 months ago

Hi, have you ever compared the speed of OpenRLHF and nemo-aligner ?

hijkzzz commented 4 months ago

@THINK2TRY Openrlhf with VLLM is currently faster than NEMO (3x +)