OpenGVLab / LLaMA-Adapter

[ICLR 2024] Fine-tuning LLaMA to follow Instructions within 1 Hour and 1.2M Parameters
GNU General Public License v3.0
5.71k stars 370 forks source link

Runtime Error when running fine tuning script. #41

Open seelenbrecher opened 1 year ago

seelenbrecher commented 1 year ago

Hi,

I have tried run alpaca_finetuning_v1/finetuning.sh and encounter runtime error.

Traceback (most recent call last):
  File "finetuning.py", line 294, in <module>
    main(args)
  File "finetuning.py", line 253, in main
    train_stats = train_one_epoch(
  File "/home/LLaMA-Adapter/alpaca_finetuning_v1/engine_finetuning.py", line 50, in train_one_epoch
    loss /= accum_iter
RuntimeError: Output 0 of _DDPSinkBackward is a view and is being modified inplace. This view was created inside a custom Func
tion (or because an input was returned as-is) and the autograd logic to handle view+inplace would override the custom backward
 associated with the custom Function, leading to incorrect gradients. This behavior is forbidden. You can fix this by cloning
the output of the custom Function.

I tried cloning the loss by adding loss = loss.clone() before calling loss /= accum_iter, and the script is working. However, I am not sure whether this will affect the backward process (or the training) or not. Besides, any suggestion to avoid this runtime error?

My environment:

GPU = NVIDIA Tesla V100 SXM3 32 GB
CUDA Version = 11.1
torch version = 1.10.1+cu111

Thank you

aojunzz commented 1 year ago

you can use higher PyTorch version, the default version is PyTorch2.0.