Closed fozziethebeat closed 1 month ago
@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece?
@fozziethebeat but for DPO training, since trl handles the tokenization, do we need this piece?
Was this in reference to the change in the debugging output? If so, it's not required but I think anyone manually inspecting tokenization output (like i did) would be very surprised to see the bos token duplicated in numerous scenarios. So it's more to give confidence that we constructed the strings correctly.
Any other changes to add before updating the branch and approving for merging?
Description
Replicates the
chat_template
support from SFT datasets but for DPO training. Users can now specify a dataset with a list of conversation messages along with rejected and chosen columns having a single conversation message. Further, all fields can be customized.Motivation and Context
This change provides a more configurable set of datasets for DPO training. Fixes #1708
How has this been tested?
Screenshots (if appropriate)
Types of changes
Social Handles (Optional)
@fozziethebeat