axolotl-ai-cloud / axolotl

Go ahead and axolotl questions
https://axolotl-ai-cloud.github.io/axolotl/
Apache License 2.0
7.86k stars 864 forks source link

DPO Trainer Incorrectly Inserts BoS Before Chosen and Rejected Prompts for Llama3 #1616

Open Catgat opened 5 months ago

Catgat commented 5 months ago

Please check that this issue hasn't been reported before.

Expected Behavior

The BoS should only appear at the start of the prompt.

Current behaviour

The BoS token is inserted at the start of the prompt and also at the start of the Chosen and Rejected prompts.

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:91] [PID:718] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000)

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:92] [PID:718] [RANK:0] CHOSEN RESPONSE: <|begin_of_text|>(128000)

[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:93] [PID:718] [RANK:0] REJECTED RESPONSE: <|begin_of_text|>(128000)

Steps to reproduce

Run a DPO tune using intel.chatml. Preprocess the dataset with --debug flag and you'll see that the BoS token is outputted.

Config yaml

rl: dpo
datasets:
  - ds_type: json
    data_files: 
      - combinedDPO.json
    split: train
    type: chatml.intel

Possible solution

No response

Which Operating Systems are you using?

Python Version

Whatever version the latest docker uses.

axolotl branch-commit

The latest commit that the docker is using.

Acknowledgements

kubernetes-bad commented 5 months ago

I can confirm it actually sends it to the trainer too. I open the tokenized cache from the preprocessed dataset folder

from datasets import Dataset
ds = Dataset.from_file("./cache-4c137b002286c55e.arrow")
sample = ds.take(1)
print(sample["chosen_input_ids"])

# [[128000, 128254, 882, 198, 5618, 63179, ...
#   ^ this is <|begin_of_text|>
xzuyn commented 4 months ago

Still an issue. I'm also seeing the input having double BOS, and the chosen/rejected lacking an EOS. This is with ORPO though, not DPO. Screenshot from 2024-06-29 21-53-56

rl: orpo
orpo_alpha: 0.1
remove_unused_columns: false
chat_template: llama3
datasets:
  - path: argilla/ultrafeedback-binarized-preferences-cleaned
    type: chat_template.argilla
Catgat commented 4 months ago

Still broken! :)

maziyarpanahi commented 2 months ago

Still broken! :)

there is a PR, have you tested the PR to see it works?