Facico / Chinese-Vicuna

Chinese-Vicuna: A Chinese Instruction-following LLaMA-based Model —— 一个中文低资源的llama+lora方案,结构参考alpaca
https://github.com/Facico/Chinese-Vicuna
Apache License 2.0
4.14k stars 422 forks source link

NotImplementedError: Cannot copy out of meta tensor; no data! #83

Closed greatewei closed 1 year ago

greatewei commented 1 year ago

想要在原来的lora基础上进行数据增量训练,data.json有3000条左右数据 运行命令:

python finetune.py \
--data_path /data/chat/Chinese-Vicuna/data/data.json \
--output_path /data/chat/models/llama_lora/llama-7b-yy-lora \
--model_path /data/chat/models/llama_base/llama-7b-hf  \
--eval_steps 200 \
--save_steps 200 \
--test_size 1 \
--resume_from_checkpoint /data/chat/models/llama_lora/Chinese-Vicuna-lora-7b-belle-and-guanaco \
--ignore_data_skip True

错误内容:

File "/data/chat/Chinese-Vicuna/finetune.py", line 235, in trainer = transformers.Trainer( File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 498, in init self._move_model_to_device(model, args.device) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 740, in _move_model_to_device model = model.to(device) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to return self._apply(convert) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply module._apply(fn) [Previous line repeated 5 more times] File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply param_applied = fn(param) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) NotImplementedError: Cannot copy out of meta tensor; no data!

求大佬帮忙看看

Facico commented 1 year ago

因为你这个好像是模型加载的问题,你用这里面问题3的代码能正常加载吗? 或者你可以参考一下这个issue

greatewei commented 1 year ago

因为你这个好像是模型加载的问题,你用这里面问题3的代码能正常加载吗? 或者你可以参考一下这个issue

十分感谢。

image 这个错误解决了。

出现了其他错误: Traceback (most recent call last): File "/data/chat/Chinese-Vicuna/finetune.py", line 273, in trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1659, in train return inner_training_loop( File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2064, in _inner_training_loop checkpoints_sorted = self._sorted_checkpoints(use_mtime=False, output_dir=run_dir) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2923, in _sorted_checkpoints best_model_index = checkpoints_sorted.index(str(Path(self.state.best_model_checkpoint))) ValueError: 'lora-Vicuna/checkpoint-17000' is not in list

--resume_from_checkpoint /data/chat/models/llama_lora/Chinese-Vicuna-lora-7b-belle-and-guanaco ,这是我的参数,但是我不知道为什么会出现 lora-Vicuna/checkpoint-17000 路径

Facico commented 1 year ago

把你加载模型里面的trainer_state.json中best_model_checkpoint这个字段删掉,以为它会和这个字段的模型进行比较,可能你去训练其他数据的时候用不到这个模型的结果

greatewei commented 1 year ago

把你加载模型里面的trainer_state.json中best_model_checkpoint这个字段删掉,以为它会和这个字段的模型进行比较,可能你去训练其他数据的时候用不到这个模型的结果

解决了

dj1150277 commented 1 year ago

因为你这个好像是模型加载的问题,你用这里面问题3的代码能正常加载吗? 或者你可以参考一下这个issue

十分感谢。

image 这个错误解决了。

出现了其他错误: Traceback (most recent call last): File "/data/chat/Chinese-Vicuna/finetune.py", line 273, in trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1659, in train return inner_training_loop( File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2064, in _inner_training_loop checkpoints_sorted = self._sorted_checkpoints(use_mtime=False, output_dir=run_dir) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2923, in _sorted_checkpoints best_model_index = checkpoints_sorted.index(str(Path(self.state.best_model_checkpoint))) ValueError: 'lora-Vicuna/checkpoint-17000' is not in list

--resume_from_checkpoint /data/chat/models/llama_lora/Chinese-Vicuna-lora-7b-belle-and-guanaco ,这是我的参数,但是我不知道为什么会出现 lora-Vicuna/checkpoint-17000 路径

真的太感谢了,我去至少看了10多个post,最终用这个方法解决了