Citation or comparison to trlX and NeMo-align.

LouisCastricato commented 4 months ago

Hi

I notice you cite "70B+ Full Tuning with 16 A100" however this is also something that trlX (and that we worked very hard to add ;) ) supports via NeMo support. Similarly, this is something that nvidia has implemented in their NeMo suite of tools.

Similarly, all of the libraries you compare to have implemented "PPO Implementation Tricks." trlX first did this in the fall of 2022, and I discussed this with the hugging face trl team shortly after (I believe they had it merged by January or February 2023).

This aside though, these tricks often don't actually really benefit RLHF as some of my friends (Costa Huang + others.) and I have discussed. If you have noticed contradicting evidence I would be incredibly interested in learning more.

hijkzzz commented 4 months ago

We will compare OpenRLHF with the existing framework in a formal technical report.

The biggest weaknesses of NeMo-based solutions are:

Megatron-based generation is very slow, which leads to low overall training efficiency.
It does not support models on HuggingFace.

TRLX and TRL use the merged actor-critic to train the models, which conflicts with the standard process of RLHF. This is one of our motivations for developing OpenRLHF.

For PPO implementation tricks, this is a common practice in RL and has been around for a long time. We only used the necessary tricks, and indeed most of them were already implemented in TRLX.

THINK2TRY commented 4 months ago

Hi, have you ever compared the speed of OpenRLHF and nemo-aligner ?

hijkzzz commented 4 months ago

@THINK2TRY Openrlhf with VLLM is currently faster than NEMO (3x +)

OpenLLMAI / OpenRLHF

Citation or comparison to trlX and NeMo-align. #221