baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.
https://huggingface.co/baichuan-inc/baichuan-7B
Apache License 2.0
5.67k stars 506 forks source link

[Question] Lora微调训练的时候报错 #72

Open wickedvalley opened 1 year ago

wickedvalley commented 1 year ago

Required prerequisites

Questions

Traceback (most recent call last): File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_sft.py", line 97, in main() File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_sft.py", line 69, in main train_result = trainer.train() File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train return inner_training_loop( File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/transformers/trainer.py", line 1987, in _inner_training_loop self.accelerator.clip_gradnorm( File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1893, in clip_gradnorm self.unscale_gradients() File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/accelerate/accelerator.py", line 1856, in unscalegradients self.scaler.unscale(opt) File "/home/pai/envs/llama_etuning/lib/python3.10/site-packages/torch/cuda/amp/gradscaler.py", line 275, in unscale raise RuntimeError("unscale() has already been called on this optimizer since the last update().") RuntimeError: unscale() has already been called on this optimizer since the last update().

Checklist

jiacheo commented 1 year ago

用 train_pt.py 也类似错误:

Traceback (most recent call last): File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_pt.py", line 81, in main() File "/mnt/workspace/LLaMA-Efficient-Tuning/src/train_pt.py", line 53, in main train_result = trainer.train() File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/transformers/trainer.py", line 1645, in train return inner_training_loop( File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/transformers/trainer.py", line 1987, in _inner_training_loop self.accelerator.clip_gradnorm( File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/accelerate/accelerator.py", line 1893, in clip_gradnorm self.unscale_gradients() File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/accelerate/accelerator.py", line 1856, in unscalegradients self.scaler.unscale(opt) File "/root/anaconda3/envs/baichuan-lora/lib/python3.10/site-packages/torch/cuda/amp/gradscaler.py", line 275, in unscale raise RuntimeError("unscale() has already been called on this optimizer since the last update().") RuntimeError: unscale() has already been called on this optimizer since the last update(). 3%|███▎ | 1/30 [00:07<03:27, 7.17s/it]

jiacheo commented 1 year ago

参考这篇: https://github.com/huggingface/transformers/issues/24245 ,看样子是transformers某个版本的bug,换成评论里写的

!pip install git+https://github.com/huggingface/transformers@de9255de27abfcae4a1f816b904915f0b1e23cd9 就OK了。

wickedvalley commented 1 year ago

换成指定的transformers==4.29.1就好了