artidoro / qlora

QLoRA: Efficient Finetuning of Quantized LLMs
https://arxiv.org/abs/2305.14314
MIT License
9.72k stars 798 forks source link

RecursionError: maximum recursion depth exceeded #30

Open atillabasaran opened 1 year ago

atillabasaran commented 1 year ago

I am getting maximum recursion depth error after running this following command: python qlora.py --model_name_or_path decapoda-research/llama-7b-hf

And this is the error I got:

  File "/home/atilla/miniconda3/envs/qlora/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc     return self.unk_token_id   File "/home/atilla/miniconda3/envs/qlora/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1142, in unk_token_id     return self.convert_tokens_to_ids(self.unk_token)   File "/home/atilla/miniconda3/envs/qlora/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids     return self._convert_token_to_id_with_added_voc(tokens)   File "/home/atilla/miniconda3/envs/qlora/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc     return self.unk_token_id   File "/home/atilla/miniconda3/envs/qlora/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1142, in unk_token_id     return self.convert_tokens_to_ids(self.unk_token)   File "/home/atilla/miniconda3/envs/qlora/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids     return self._convert_token_to_id_with_added_voc(tokens)   File "/home/atilla/miniconda3/envs/qlora/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc     return self.unk_token_id   File "/home/atilla/miniconda3/envs/qlora/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1142, in unk_token_id     return self.convert_tokens_to_ids(self.unk_token) RecursionError: maximum recursion depth exceeded

atillabasaran commented 1 year ago

Normally I was getting OverflowError but I followed this PR and it resolved but now I got this error.

phalexo commented 1 year ago

Normally I was getting OverflowError but I followed this PR and it resolved but now I got this error.

I see the same thing, I changed LlamaTokenizerFast back to LlamaTokenizer.

Now I have another issue it dumps core when it is cleaning up something.

jwnsu commented 1 year ago

seems to be caused by the included, old tokenizer in decapoda-research/llama-7b-hf, see detail here: https://github.com/huggingface/transformers/issues/22762. Was able to resolve the issue here by switching to huggyllama/llama-7b, which has newer, correct tokenizer.

timohear commented 1 year ago

Forcing the unk_token fixed this for me (v4.30.1): tokenizer = tokenizer_class.from_pretrained(model_name_or_path, unk_token="<unk>")

SuperBruceJia commented 1 week ago

use_fast=False, can solve this issue.

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=False)

Best regards,

Shuyue July 6th, 2024