Open Catgat opened 5 months ago
I can confirm it actually sends it to the trainer too. I open the tokenized cache from the preprocessed dataset folder
from datasets import Dataset
ds = Dataset.from_file("./cache-4c137b002286c55e.arrow")
sample = ds.take(1)
print(sample["chosen_input_ids"])
# [[128000, 128254, 882, 198, 5618, 63179, ...
# ^ this is <|begin_of_text|>
Still an issue. I'm also seeing the input having double BOS, and the chosen/rejected lacking an EOS. This is with ORPO though, not DPO.
rl: orpo
orpo_alpha: 0.1
remove_unused_columns: false
chat_template: llama3
datasets:
- path: argilla/ultrafeedback-binarized-preferences-cleaned
type: chat_template.argilla
Still broken! :)
Still broken! :)
there is a PR, have you tested the PR to see it works?
Please check that this issue hasn't been reported before.
Expected Behavior
The BoS should only appear at the start of the prompt.
Current behaviour
The BoS token is inserted at the start of the prompt and also at the start of the Chosen and Rejected prompts.
[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:91] [PID:718] [RANK:0] INPUT PROMPT: <|begin_of_text|>(128000)
[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:92] [PID:718] [RANK:0] CHOSEN RESPONSE: <|begin_of_text|>(128000)
[2024-05-13 19:18:27,809] [INFO] [axolotl.check_rl_example_labels:93] [PID:718] [RANK:0] REJECTED RESPONSE: <|begin_of_text|>(128000)
Steps to reproduce
Run a DPO tune using intel.chatml. Preprocess the dataset with --debug flag and you'll see that the BoS token is outputted.
Config yaml
Possible solution
No response
Which Operating Systems are you using?
Python Version
Whatever version the latest docker uses.
axolotl branch-commit
The latest commit that the docker is using.
Acknowledgements