dvlab-research / SphereFormer

The official implementation for "Spherical Transformer for LiDAR-based 3D Recognition" (CVPR 2023).
Apache License 2.0
306 stars 35 forks source link

RuntimeError: CUDA error: invalid device ordinal (only 1 GPU in my system, how to resolve) #55

Open Jayku88 opened 1 year ago

Jayku88 commented 1 year ago

[09/12 09:35:46 main-logger]: use SyncBN /home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/multiprocessing/semaphore_tracker.py:144: UserWarning: semaphore_tracker: There appear to be 3 leaked semaphores to clean up at shutdown len(cache)) Traceback (most recent call last): File "train.py", line 902, in main() File "train.py", line 90, in main mp.spawn(main_worker, nprocs=args.ngpus_per_node, args=(args.ngpus_per_node, args)) File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 150, in join raise ProcessRaisedException(msg, error_index, failed_process.pid) torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 1 terminated with the following error: Traceback (most recent call last): File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap fn(i, *args) File "/home/vrlabhlbs/SphereFormer/train.py", line 156, in main_worker torch.cuda.set_device(gpu) File "/home/vrlabhlbs/anaconda3/envs/spheretest/lib/python3.7/site-packages/torch/cuda/init.py", line 261, in set_device torch._C._cuda_setDevice(device) RuntimeError: CUDA error: invalid device ordinal

Bob-Maxwell commented 1 month ago

Screenshot from 2024-10-04 22-22-11 I ran into the same problem, but actually, you just need to modify the train_gpu parameter in the .yaml and all will be fine.