SCIR-HI / Huatuo-Llama-Med-Chinese

Repo for BenTsao [original name: HuaTuo (华驼)], Instruction-tuning Large Language Models with Chinese Medical Knowledge. 本草(原名:华驼)模型仓库,基于中文医学知识的大语言模型指令微调
Apache License 2.0
4.31k stars 422 forks source link

RecursionError: maximum recursion depth exceeded #86

Closed nkcsjxd closed 8 months ago

nkcsjxd commented 9 months ago

File "/home/gfr/miniconda3/envs/Huatuo/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) File "/home/gfr/miniconda3/envs/Huatuo/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 250, in convert_tokens_to_ids return self._convert_token_to_id_with_added_voc(tokens) File "/home/gfr/miniconda3/envs/Huatuo/lib/python3.9/site-packages/transformers/tokenization_utils_fast.py", line 257, in _convert_token_to_id_with_added_voc return self.unk_token_id File "/home/gfr/miniconda3/envs/Huatuo/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1155, in unk_token_id return self.convert_tokens_to_ids(self.unk_token) RecursionError: maximum recursion depth exceeded 运行之后的报错信息,请问是为什么? 之前遇到ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported. 将llama基座中的参数改为LlamaTokenizer解决 又遇到TypeError: Descriptors cannot not be created directly. If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). 将protobuf版本改为3.20.1解决
s65b40 commented 9 months ago

您好,请您完整描述一下您所运行的代码,并尽量提供报错的完整内容,谢谢

nkcsjxd commented 9 months ago

运行代码: /home/gfr/jxd/Huatuo-Llama-Med-Chinese/finetune.py --base_model ./model/llama-7b-hf --data_path ./data/llama_data.json --output_dir ./lora-llama-l1 --prompt_template_name med_template --micro_batch_size 128 --batch_size 128 --wandb_run_name l1 我觉得可能是这个原因: Hey! The main issue is that they did not update the tokenizer files at "decapoda-research/llama-7b-hf" but they are using the latest version of transformers. The tokenizer was fixed see https://github.com/huggingface/transformers/pull/22402 and corrected. Nothing we can do on our end... 我尝试把基模型中的tokenizer_config.json文件修改为: { "add_prefix_space": false, "bos_token": "<s>", "eos_token": "</s>", "model_max_length": 1000000000000000019884624838656, "pad_token": "<pad>", "padding_side": "right", "special_tokens_map_file": null, "tokenizer_class": "LlamaTokenizer", "unk_token": "<unk>" } 后可以运行,具体原因不是很清楚。