Direct Policy Optimization

CarperAI / trlx

A repo for distributed training of language models with Reinforcement Learning via Human Feedback (RLHF)

MIT License

4.5k stars 473 forks source link

Open Reichenbachian opened 1 year ago

Reichenbachian commented 1 year ago

Hey all! Appreciate the work.

Is there any word on whether DPO (direct policy optimization) will be integrated into the trlx library soon?

No response

No response

Forbu commented 1 year ago

CSerxy commented 1 year ago

wonder if there is any updates regarding implementing dpo features in trlx, many thanks!

maxreciprocate commented 1 year ago

There hasn't been any updates regarding that. AFAIK nobody is currently working on it, so you can freely pick it up if you want!

sandeepchittilla commented 1 year ago

Hi, is this something that is still open to work on? I would like to pick it up if that is okay :)

@CSerxy I've just forked and begun work on this feature, let me know if this conflicts with you