[Question]: 后面会开发ppo和reward模型的训练方法吗

PaddlePaddle / PaddleNLP

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.

https://paddlenlp.readthedocs.io

Apache License 2.0

11.73k stars 2.86k forks source link

[Question]: 后面会开发ppo和reward模型的训练方法吗 #6337

Open liuzhipengchd opened 12 months ago

liuzhipengchd commented 12 months ago

请提出你的问题

后面会开发ppo和reward模型的训练方法吗

w5688414 commented 1 month ago

已经支持，欢迎使用。

https://github.com/PaddlePaddle/PaddleNLP/tree/develop/examples/RLHF