报错如下:
Traceback (most recent call last):
File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 14, in
main()
File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 5, in main
run_exp()
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/tuner.py", line 26, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 29, in run_sft
model, tokenizer = load_model_and_tokenizer(model_args, finetuning_args, training_args.do_train)
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/model/loader.py", line 49, in load_model_and_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs)
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained
return cls._from_pretrained(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained
tokenizer = cls(init_inputs, init_kwargs)
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 69, in init
super().init(padding_side=padding_side, kwargs)
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init
self._add_tokens(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 108, in get_vocab
vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in vocab_size
return self.tokenizer.n_words
AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'. Did you mean: 'tokenize'?
Is there an existing issue for this?
Current Behavior
报错如下: Traceback (most recent call last): File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 14, in
main()
File "/home/inspur/scc/gpt/LLaMA-Factory/src/train_bash.py", line 5, in main
run_exp()
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/tuner.py", line 26, in run_exp
run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/train/sft/workflow.py", line 29, in run_sft
model, tokenizer = load_model_and_tokenizer(model_args, finetuning_args, training_args.do_train)
File "/home/inspur/scc/gpt/LLaMA-Factory/src/llmtuner/model/loader.py", line 49, in load_model_and_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 774, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, *kwargs)
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2028, in from_pretrained
return cls._from_pretrained(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2260, in _from_pretrained
tokenizer = cls(init_inputs, init_kwargs)
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 69, in init
super().init(padding_side=padding_side, kwargs)
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 367, in init
self._add_tokens(
File "/home/inspur/anaconda3/envs/llama/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 108, in get_vocab
vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
File "/home/inspur/.cache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 104, in vocab_size
return self.tokenizer.n_words
AttributeError: 'ChatGLMTokenizer' object has no attribute 'tokenizer'. Did you mean: 'tokenize'?
这个要降transformers的版本才能加载。但是降版本后又会导致其他问题。
Expected Behavior
No response
Steps To Reproduce
我的命令: CUDA_VISIBLE_DEVICES=5 python src/train_bash.py \ --stage sft \ --do_train \ --model_name_or_path /path/THUDM/chatglm2-6b \ --dataset alpaca_gpt4_zh \ --template chatglm2 \ --finetuning_type lora \ --lora_target query_key_value \ --output_dir /path/chatglm2 \ --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --fp16
Environment
Anything else?
No response