When I try to finetune, I always get this error after executing the "finetune_lora.sh" file
any help is appreciated (:
rank0: Traceback (most recent call last):
rank0: File "/home/abgespielt71/finetune/finetune.py", line 328, in
rank0: File "/home/abgespielt71/finetune/finetune.py", line 274, in train
rank0: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2471, in requiresgrad
rank0: RuntimeError: only Tensors of floating point dtype can require gradients
E0627 23:26:46.789000 140067509442368 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 2800) of binary: /opt/conda/bin/python3.10
Traceback (most recent call last):
File "/opt/conda/bin/torchrun", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
When I try to finetune, I always get this error after executing the "finetune_lora.sh" file
any help is appreciated (:
rank0: Traceback (most recent call last): rank0: File "/home/abgespielt71/finetune/finetune.py", line 328, in
rank0: File "/home/abgespielt71/finetune/finetune.py", line 274, in train
rank0: File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2471, in requiresgrad
rank0: RuntimeError: only Tensors of floating point dtype can require gradients E0627 23:26:46.789000 140067509442368 torch/distributed/elastic/multiprocessing/api.py:826] failed (exitcode: 1) local_rank: 0 (pid: 2800) of binary: /opt/conda/bin/python3.10 Traceback (most recent call last): File "/opt/conda/bin/torchrun", line 8, in
sys.exit(main())
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 347, in wrapper
return f(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 879, in main
run(args)
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/run.py", line 870, in run
elastic_launch(
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 132, in call
return launch_agent(self._config, self._entrypoint, list(args))
File "/opt/conda/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 263, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
finetune.py FAILED
Failures: