axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.48k stars 808 forks source link

Add a `chat_template` prompt strategy for DPO #1725

Closed fozziethebeat closed 1 month ago

fozziethebeat commented 2 months ago

Description

Replicates the chat_template support from SFT datasets but for DPO training. Users can now specify a dataset with a list of conversation messages along with rejected and chosen columns having a single conversation message. Further, all fields can be customized.

Motivation and Context

This change provides a more configurable set of datasets for DPO training. Fixes #1708

How has this been tested?

Screenshots (if appropriate)

Types of changes

Social Handles (Optional)

@fozziethebeat

winglian commented 2 months ago

@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece?

fozziethebeat commented 2 months ago

@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece?

Was this in reference to the change in the debugging output? If so, it's not required but I think anyone manually inspecting tokenization output (like i did) would be very surprised to see the bos token duplicated in numerous scenarios. So it's more to give confidence that we constructed the strings correctly.

fozziethebeat commented 1 month ago

Any other changes to add before updating the branch and approving for merging?