Detected kernel version 4.19.118, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
0%| | 0/53850 [00:00<?, ?it/s]`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`...
/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/torch/utils/checkpoint.py:464: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.4 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
warnings.warn(
/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(
Traceback (most recent call last):
File "/home/powerop/work/rwq/llm_finetune/app/finetune_qwen2.py", line 92, in <module>
qwen2.train()
File "/home/powerop/work/rwq/llm_finetune/app/finetune_qwen2.py", line 87, in train
trainer.train()
File "/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train
return inner_training_loop(
File "/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 2216, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/transformers/trainer.py", line 3250, in training_step
self.accelerator.backward(loss)
File "/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/accelerate/accelerator.py", line 1966, in backward
loss.backward(**kwargs)
File "/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/torch/_tensor.py", line 525, in backward
torch.autograd.backward(
File "/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/torch/autograd/__init__.py", line 267, in backward
_engine_run_backward(
File "/home/powerop/work/conda/envs/qwen2/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
环境按照文档进行安装:https://github.com/datawhalechina/self-llm/blob/master/Qwen2/05-Qwen2-7B-Instruct%20Lora%20%E5%BE%AE%E8%B0%83.md
python版本:3.10.12 cuda: 12.1 os: ubuntu12 运行代码:
报错信息: