Closed greatewei closed 1 year ago
十分感谢。
这个错误解决了。
出现了其他错误:
Traceback (most recent call last):
File "/data/chat/Chinese-Vicuna/finetune.py", line 273, in
--resume_from_checkpoint /data/chat/models/llama_lora/Chinese-Vicuna-lora-7b-belle-and-guanaco ,这是我的参数,但是我不知道为什么会出现 lora-Vicuna/checkpoint-17000 路径
把你加载模型里面的trainer_state.json中best_model_checkpoint这个字段删掉,以为它会和这个字段的模型进行比较,可能你去训练其他数据的时候用不到这个模型的结果
把你加载模型里面的trainer_state.json中best_model_checkpoint这个字段删掉,以为它会和这个字段的模型进行比较,可能你去训练其他数据的时候用不到这个模型的结果
解决了
十分感谢。
这个错误解决了。
出现了其他错误: Traceback (most recent call last): File "/data/chat/Chinese-Vicuna/finetune.py", line 273, in trainer.train(resume_from_checkpoint=args.resume_from_checkpoint) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 1659, in train return inner_training_loop( File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2064, in _inner_training_loop checkpoints_sorted = self._sorted_checkpoints(use_mtime=False, output_dir=run_dir) File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 2923, in _sorted_checkpoints best_model_index = checkpoints_sorted.index(str(Path(self.state.best_model_checkpoint))) ValueError: 'lora-Vicuna/checkpoint-17000' is not in list
--resume_from_checkpoint /data/chat/models/llama_lora/Chinese-Vicuna-lora-7b-belle-and-guanaco ,这是我的参数,但是我不知道为什么会出现 lora-Vicuna/checkpoint-17000 路径
真的太感谢了,我去至少看了10多个post,最终用这个方法解决了
想要在原来的lora基础上进行数据增量训练,data.json有3000条左右数据 运行命令:
错误内容:
File "/data/chat/Chinese-Vicuna/finetune.py", line 235, in
trainer = transformers.Trainer(
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 498, in init
self._move_model_to_device(model, args.device)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/transformers/trainer.py", line 740, in _move_model_to_device
model = model.to(device)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1145, in to
return self._apply(convert)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 797, in _apply
module._apply(fn)
[Previous line repeated 5 more times]
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 820, in _apply
param_applied = fn(param)
File "/root/anaconda3/envs/python3.9/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
NotImplementedError: Cannot copy out of meta tensor; no data!
求大佬帮忙看看