Open arendu opened 1 week ago
This PR makes the dpo dataset use chat format tokens from the model's config yaml instead of hardcoding chat/special tokens in the jsonl data file.
Currently, each datapoint inside a DPO jsonl data file, looks like this:
{ "prompt": "<extra_id_0>System\n\n<extra_id_1>User\nbacillus subtilus\n<extra_id_1>Assistant\n", "chosen_response": "Bacillus ... and industry alike.\n<extra_id_1>", "rejected_response": "The Bacillus ... fields of study.\n<extra_id_1>", "rejected_reward": 3, "chosen_reward": 4 }
With this PR it should be like this (OpenAI list of messages format with no chat/formatting tokens):
{ "prompt": [ { "role": "system", "content": "" }, { "role": "user", "content": "bacillus subtilus" } ], "chosen_response": { "role": "assistant", "content": "Bacillus ... and industry alike." }, "rejected_response": { "role": "assistant", "content": "The Bacillus ... fields of study." }, "chosen_reward": 4, "rejected_reward": 3 }
Additionally There is a script added to convert old data files into the new format.
python nemo_aligner/data/nlp/scripts/undo_special_tokens.py <path_to_old_format_dpo_jsonl_file>
A new file will be written in the same location as the old format file.
# Add a code snippet demonstrating how to use this
Pre checks:
max_steps=-1
validation
closing in favor of #403
What does this PR do ?
This PR makes the dpo dataset use chat format tokens from the model's config yaml instead of hardcoding chat/special tokens in the jsonl data file.
Currently, each datapoint inside a DPO jsonl data file, looks like this:
With this PR it should be like this (OpenAI list of messages format with no chat/formatting tokens):
Additionally There is a script added to convert old data files into the new format.
A new file will be written in the same location as the old format file.
Changelog
Usage
Before your PR is "Ready for review"
Pre checks:
Checklist when contributing a new algorithm
max_steps=-1
andvalidation
?Additional Information