RLHFlow / RLHF-Reward-Modeling

Recipes to train reward model for RLHF.
https://rlhflow.github.io/
Apache License 2.0
705 stars 58 forks source link

Armo-rm env set-up and data processing #35

Open MaxwellJryao opened 1 week ago

MaxwellJryao commented 1 week ago

Hi,

I plan to reproduce the armo-rm results but haven't found the env requirements. Is it the same as the bt model? Currently I used the bt model env but the data processing is quite slow, which is estimated to finish in 13 hours. Any suggestions?

Thanks.

WeiXiongUST commented 1 week ago

The environment for BT reward model should be enough with an additional sklearn package.

We need to inference the ~600K samples with the bt reward model so it might take a few hours.