Closed hammoudhasan closed 1 month ago
@hammoudhasan did you try adding?
tokens:
- "<|im_start|>"
- "<|im_end|>"
I don't think I had those in my config. Let me test and get back to you on that (: Thank you @winglian
As you mentioned those tokens were missing from my end ! Adding those worked.
Please check that this issue hasn't been reported before.
Expected Behavior
When one defines special tokens or added tokens they should be added to the tokenizer configuration figure before running the preprocessing tokenization step.
Current behaviour
Currently data is being tokenized without the specified new special tokens (i.e replaced by spaces where as in the defined chat template should appear).
Steps to reproduce
Config yaml
No response
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10
axolotl branch-commit
main
Acknowledgements