Closed zhuiyue233 closed 4 months ago
I met the same question and could not figure out why.
I met the same question and could not figure out why.
I solved it. I try to install requirement.txt one by one. And pip install -U bitsandbytes again.
I don't kown why.
I met the same question and could not figure out why.
And maybe you can try to change python -m torch.distributed.run --nnodes=1 --nproc_per_node=4 --master_port=20001 to python -m torch.distributed.run --nnodes=1 --nproc_per_node=1 --master_port=20001 in file "graphgpt_stage1.sh "
I met the same question and could not figure out why.
And maybe you can try to change python -m torch.distributed.run --nnodes=1 --nproc_per_node=4 --master_port=20001 to python -m torch.distributed.run --nnodes=1 --nproc_per_node=1 --master_port=20001 in file "graphgpt_stage1.sh "
thanks, this error have been resolved.
When I run graphgpt_stage1.sh,it makes some errors:
/anaconda3/envs/GGPT/lib/python3.10/site-packages/transformers/hf_argparser.py", line 338, in parse_args_into_dataclasses obj = dtype(**inputs) File "", line 125, in init
File "/anaconda3/envs/GGPT/lib/python3.10/site-packages/transformers/training_args.py", line 1372, in __post_init__
and (self.device.type != "cuda")
File “/anaconda3/envs/GGPT/lib/python3.10/site-packages/transformers/training_args.py", line 1795, in device
return self._setup_devices
RuntimeError: CUDA error: invalid device ordinal CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.could you please give me some help?