NVIDIA / NeMo-Aligner

Scalable toolkit for efficient model alignment
Apache License 2.0
628 stars 78 forks source link

feat: support new DPO data format #405

Open arendu opened 1 week ago

arendu commented 1 week ago

What does this PR do ?

This PR makes the dpo dataset use chat format tokens from the model's config yaml instead of hardcoding chat/special tokens in the jsonl data file.

Currently, each datapoint inside a DPO jsonl data file, looks like this:

{
  "prompt": "<extra_id_0>System\n\n<extra_id_1>User\nbacillus subtilus\n<extra_id_1>Assistant\n",
  "chosen_response": "Bacillus ... and industry alike.\n<extra_id_1>",
  "rejected_response": "The Bacillus ... fields of study.\n<extra_id_1>",
  "rejected_reward": 3,
  "chosen_reward": 4
}

With this PR it should be like this (OpenAI list of messages format with no chat/formatting tokens):

{
  "prompt": [
    {
      "role": "system",
      "content": ""
    },
    {
      "role": "user",
      "content": "bacillus subtilus"
    }
  ],
  "chosen_response": {
    "role": "assistant",
    "content": "Bacillus ... and industry alike."
  },
  "rejected_response": {
    "role": "assistant",
    "content": "The Bacillus ... fields of study."
  },
  "chosen_reward": 4,
  "rejected_reward": 3
}

Additionally There is a script added to convert old data files into the new format.

python nemo_aligner/data/nlp/scripts/undo_special_tokens.py <path_to_old_format_dpo_jsonl_file>

A new file will be written in the same location as the old format file.

Changelog

Usage

# Add a code snippet demonstrating how to use this 

Before your PR is "Ready for review"

Pre checks:

Checklist when contributing a new algorithm

Additional Information

terrykong commented 6 days ago

closing in favor of #403