Lightning-AI / litgpt

Pretrain, finetune, deploy 20+ LLMs on your own data. Uses state-of-the-art techniques: flash attention, FSDP, 4-bit, LoRA, and more.
https://lightning.ai
Apache License 2.0
6.85k stars 726 forks source link

Tokenizer: `add_prefix_space` shouldn't affect `self.use_bos` #1342

Closed carmocca closed 3 weeks ago

carmocca commented 3 weeks ago

Copy of https://github.com/Lightning-AI/litgpt/pull/1328 so that CI can run with the HF token

cc @Andrei-Aksionov

Andrei-Aksionov commented 3 weeks ago

I guess the good news is that test_tokenizer didn't fail 😆