[INFO|trainer.py:1712] 2023-10-19 09:44:55,247 >> Running training
[INFO|trainer.py:1713] 2023-10-19 09:44:55,247 >> Num examples = 9,861
[INFO|trainer.py:1714] 2023-10-19 09:44:55,247 >> Num Epochs = 10
[INFO|trainer.py:1715] 2023-10-19 09:44:55,247 >> Instantaneous batch size per device = 1
[INFO|trainer.py:1718] 2023-10-19 09:44:55,247 >> Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:1719] 2023-10-19 09:44:55,247 >> Gradient Accumulation steps = 8
[INFO|trainer.py:1720] 2023-10-19 09:44:55,247 >> Total optimization steps = 1,540
[INFO|trainer.py:1721] 2023-10-19 09:44:55,252 >> Number of trainable parameters = 19,988,480
0%| | 0/1540 [00:00<?, ?it/s][WARNING|logging.py:305] 2023-10-19 09:44:55,318 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[WARNING|logging.py:305] 2023-10-19 09:44:55,325 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[WARNING|logging.py:305] 2023-10-19 09:44:55,342 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[WARNING|logging.py:305] 2023-10-19 09:44:55,346 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[WARNING|logging.py:305] 2023-10-19 09:44:55,347 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[WARNING|logging.py:305] 2023-10-19 09:44:55,350 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[WARNING|logging.py:305] 2023-10-19 09:44:55,351 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
[WARNING|logging.py:305] 2023-10-19 09:44:55,355 >> use_cache=True is incompatible with gradient checkpointing. Setting use_cache=False...
/usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn("None of the inputs have requires_grad=True. Gradients will be None")
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, *kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, *kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
0%| | 0/1540 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, *kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
[2023-10-19 09:44:57,212] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092520
[2023-10-19 09:44:57,257] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092521
[2023-10-19 09:44:57,413] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092522
[2023-10-19 09:44:57,590] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092523
[2023-10-19 09:44:57,632] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092524
[2023-10-19 09:44:57,650] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092525
[2023-10-19 09:44:57,650] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092526
[2023-10-19 09:44:57,668] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092527
[INFO|trainer.py:1712] 2023-10-19 09:44:55,247 >> Running training [INFO|trainer.py:1713] 2023-10-19 09:44:55,247 >> Num examples = 9,861 [INFO|trainer.py:1714] 2023-10-19 09:44:55,247 >> Num Epochs = 10 [INFO|trainer.py:1715] 2023-10-19 09:44:55,247 >> Instantaneous batch size per device = 1 [INFO|trainer.py:1718] 2023-10-19 09:44:55,247 >> Total train batch size (w. parallel, distributed & accumulation) = 64 [INFO|trainer.py:1719] 2023-10-19 09:44:55,247 >> Gradient Accumulation steps = 8 [INFO|trainer.py:1720] 2023-10-19 09:44:55,247 >> Total optimization steps = 1,540 [INFO|trainer.py:1721] 2023-10-19 09:44:55,252 >> Number of trainable parameters = 19,988,480 0%| | 0/1540 [00:00<?, ?it/s][WARNING|logging.py:305] 2023-10-19 09:44:55,318 >>
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, *kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, *kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
0%| | 0/1540 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Traceback (most recent call last):
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in
main()
File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 651, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1553, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 1835, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/usr/local/lib/python3.10/site-packages/transformers/trainer.py", line 2690, in training_step
self.accelerator.backward(loss)
File "/usr/local/lib/python3.10/site-packages/accelerate/accelerator.py", line 1979, in backward
self.deepspeed_engine_wrapped.backward(loss, kwargs)
File "/usr/local/lib/python3.10/site-packages/accelerate/utils/deepspeed.py", line 167, in backward
self.engine.backward(loss, *kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 1895, in backward
self.optimizer.backward(loss, retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/zero/stage_1_and_2.py", line 1902, in backward
self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 63, in backward
scaled_loss.backward(retain_graph=retain_graph)
File "/usr/local/lib/python3.10/site-packages/torch/_tensor.py", line 488, in backward
torch.autograd.backward(
File "/usr/local/lib/python3.10/site-packages/torch/autograd/init.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
[2023-10-19 09:44:57,212] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092520
[2023-10-19 09:44:57,257] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092521
[2023-10-19 09:44:57,413] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092522
[2023-10-19 09:44:57,590] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092523
[2023-10-19 09:44:57,632] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092524
[2023-10-19 09:44:57,650] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092525
[2023-10-19 09:44:57,650] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092526
[2023-10-19 09:44:57,668] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2092527
use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... /usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") [WARNING|logging.py:305] 2023-10-19 09:44:55,325 >>use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... /usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") [WARNING|logging.py:305] 2023-10-19 09:44:55,342 >>use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... /usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") [WARNING|logging.py:305] 2023-10-19 09:44:55,346 >>use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... /usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") [WARNING|logging.py:305] 2023-10-19 09:44:55,347 >>use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... /usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") [WARNING|logging.py:305] 2023-10-19 09:44:55,350 >>use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... /usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") [WARNING|logging.py:305] 2023-10-19 09:44:55,351 >>use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... /usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") [WARNING|logging.py:305] 2023-10-19 09:44:55,355 >>use_cache=True
is incompatible with gradient checkpointing. Settinguse_cache=False
... /usr/local/lib/python3.10/site-packages/torch/utils/checkpoint.py:31: UserWarning: None of the inputs have requires_grad=True. Gradients will be None warnings.warn("None of the inputs have requires_grad=True. Gradients will be None") Traceback (most recent call last): File "/home/xxx/Llama2-Chinese/train/sft/finetune_clm_lora.py", line 690, in