[Question] Lora微调训练的时候报错

wickedvalley commented 1 year ago

Required prerequisites

[X] I have read the documentation https://github.com/baichuan-inc/baichuan-7B/blob/HEAD/README.md.
[X] I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
[ ] Consider asking first in a Discussion.

Questions

Traceback (most recent call last): File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_sft.py", line 97, in main() File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_sft.py", line 69, in main train_result = trainer.train() File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train return inner_training_loop( File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1987, in _inner_training_loop self.accelerator.clip_gradnorm( File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1893, in clip_gradnorm self.unscale_gradients() File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1856, in unscalegradients self.scaler.unscale(opt) File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/torch/cuda/amp/gradscaler.py", line 275, in unscale raise RuntimeError("unscale() has already been called on this optimizer since the last update().") RuntimeError: unscale() has already been called on this optimizer since the last update().

Checklist

[X] I have provided all relevant and necessary information above.
[X] I have chosen a suitable title for this issue.

jiacheo commented 1 year ago

用 train_pt.py 也类似错误：

Traceback (most recent call last): File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_pt.py", line 81, in main() File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_pt.py", line 53, in main train_result = trainer.train() File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train return inner_training_loop( File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/transformers/trainer.py", line 1987, in _inner_training_loop self.accelerator.clip_gradnorm( File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/accelerate/accelerator.py", line 1893, in clip_gradnorm self.unscale_gradients() File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/accelerate/accelerator.py", line 1856, in unscalegradients self.scaler.unscale(opt) File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/torch/cuda/amp/gradscaler.py", line 275, in unscale raise RuntimeError("unscale() has already been called on this optimizer since the last update().") RuntimeError: unscale() has already been called on this optimizer since the last update(). 3%|███▎ | 1/30 [00:07<03:27, 7.17s/it]

jiacheo commented 1 year ago

参考这篇： https://github.com/huggingface/transformers/issues/24245 ，看样子是transformers某个版本的bug，换成评论里写的

!pip install git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9 就OK了。

wickedvalley commented 1 year ago

换成指定的transformers==4.29.1就好了

baichuan-inc / Baichuan-7B