OpenBMB / UltraFeedback

A large-scale, fine-grained, diverse preference dataset (and models).
MIT License
297 stars 16 forks source link

Questions about training code for UltraRM/UltraCM #4

Open halfrot opened 10 months ago

halfrot commented 10 months ago

Great Work! And thanks for the contribution. May I ask you if you have plans to release the training code for UltraRM/UltraCM?

Rosenberg37 commented 10 months ago

+1,Any plan for this issue?

lifan-yuan commented 9 months ago

Thanks for your interest!

For reward modeling, we use code in this repo: https://github.com/Dahoas/reward-modeling For critique modeling, we use the code in our sister repo: https://github.com/thunlp/UltraChat

I also recommend HuggingFace TRL for easy implementation: https://huggingface.co/docs/trl/index

halfrot commented 9 months ago

Thank you! And do you have plans to open-source the out-of-box code for training? I'm somehow interested in continuing fine-tuning UltraRM for a domain-specific dataset so the detailed training code might help that.