fozziethebeat commented 2 months ago

Description

Replicates the chat_template support from SFT datasets but for DPO training. Users can now specify a dataset with a list of conversation messages along with rejected and chosen columns having a single conversation message. Further, all fields can be customized.

Motivation and Context

This change provides a more configurable set of datasets for DPO training. Fixes #1708

How has this been tested?

Unittest added for the new strategy
Manual preprocessing run over a sample dataset
Full training completed on a real dataset

Screenshots (if appropriate)

Types of changes

[x] Code changes to prompt strategies
[x] Unittests

Social Handles (Optional)

@fozziethebeat

winglian commented 2 months ago

@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece?

fozziethebeat commented 2 months ago

@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece?

Was this in reference to the change in the debugging output? If so, it's not required but I think anyone manually inspecting tokenization output (like i did) would be very surprised to see the bos token duplicated in numerous scenarios. So it's more to give confidence that we constructed the strings correctly.

fozziethebeat commented 1 month ago

Any other changes to add before updating the branch and approving for merging?

axolotl-ai-cloud / axolotl

Add a `chat_template` prompt strategy for DPO #1725