EleutherAI / gpt-neox

An implementation of model parallel autoregressive transformers on GPUs, based on the Megatron and DeepSpeed libraries
https://www.eleuther.ai/
Apache License 2.0
6.95k stars 1.02k forks source link

Add DPO training #1242

Closed dmahan93 closed 2 months ago

dmahan93 commented 5 months ago

Still a bit of a WIP (docs, adding precomputation of reference logprobs) but figured it'd be good to get it up here now since it's a fairly big change for any discussions that are needed.

dmahan93 commented 5 months ago

Also builds off https://github.com/EleutherAI/gpt-neox/pull/1240 and https://github.com/EleutherAI/gpt-neox/pull/1239 since the packing implementations/chat templating items are much more useful for DPO.