Closed YasHGoyaL27 closed 10 months ago
Hello @YasHGoyaL27,
thanks for reaching out and using our code!
This seems like a problem with the backward pass during training, which is a rather general error.
One thing I can spotted is that your PyTorch version does not seem to match the one we were using (2.0.1 versus 1.10.1). Could you please try to follow exactly the instructions in the readme to get the necessary environment?
Please let us know if you need any further help!
Did you follow all steps in the readme and which model are you running?
Thank you for your response I was able to run it by correcting the versions
Great! Thanks for the feedback.
I am getting following error while trying to replicate your code
Version used: torch == 2.0.1 pytorch_lightning == 1.9.2 deepspeed == 0.10.3
11,402.764Total estimated model params size (MB)
Epoch 0: 0%| | 0/8 [00:00<?, ?it/s]Traceback (most recent call last):
File “/code/llm/t_few/src/pl_train.py", line 139, in
File "code/llm/t_few/src/pl_train.py", line 99, in main
File "python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 768, in fit
File "python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 721, in _call_and_handle_interrupt
File "python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 809, in _fit_impl
File “python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1234, in _run
File "python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1321, in _run_stage
File "python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1351, in _run_train
File "python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
File "python3.9/site-packages/pytorch_lightning/loops/fit_loop.py", line 269, in advance
File "python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
File "python3.9/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 208, in advance
File "python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
File "python3.9/site-packages/pytorch_lightning/loops/batch/training_batch_loop.py", line 88, in advance
File "python3.9/site-packages/pytorch_lightning/loops/base.py", line 204, in run
File "python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 203, in advance
File "python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 256, in _run_optimization
File "python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 369, in _optimizer_step
File “python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1593, in _call_lightning_module_hook
File "python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1625, in optimizer_step
File "python3.9/site-packages/pytorch_lightning/core/optimizer.py", line 168, in step
File "python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 193, in optimizer_step
File “python3.9/site-packages/pytorch_lightning/plugins/precision/native_amp.py", line 80, in optimizer_step
File "python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 155, in optimizer_step
File "python3.9/site-packages/torch/optim/lr_scheduler.py", line 65, in wrapper
File "python3.9/site-packages/torch/optim/optimizer.py", line 88, in wrapper
File “python3.9/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
File "python3.9/site-packages/transformers/optimization.py", line 649, in step
File "python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 140, in _wrap_closure
File "python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 148, in call
File "python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 143, in closure
File "python3.9/site-packages/pytorch_lightning/loops/optimization/optimizer_loop.py", line 311, in backward_fn
File “python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1763, in _call_strategy_hook
File "python3.9/site-packages/pytorch_lightning/strategies/strategy.py", line 168, in backward
File "python3.9/site-packages/pytorch_lightning/plugins/precision/precision_plugin.py", line 80, in backward
File "python3.9/site-packages/pytorch_lightning/core/lightning.py", line 1370, in backward
File "python3.9/site-packages/torch/_tensor.py", line 307, in backward
File "python3.9/site-packages/torch/autograd/init.py", line 154, in backward
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Epoch 0: 0%| | 0/8 [03:19<?, ?it/s]