Alvin-Zeng / PGCN

Graph Convolutional Networks for Temporal Action Localization (ICCV2019)
321 stars 66 forks source link

RuntimeError: cuda runtime error (10) #30

Closed dreamedrainbow closed 4 years ago

dreamedrainbow commented 4 years ago

Hi,there! When runing the pgcn_tset.py for inference, I encounter the cuda error and here is my stack trace:

model epoch 15 loss: 1.4765163376217796
File parsed. Time:4.10
Dict constructed. Time:4.39
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1549630534704/work/torch/csrc/cuda/Module.cpp line=34 error=10 : invalid device ordinal
Process SpawnProcess-2:
Traceback (most recent call last):
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/users/z/PGCN/pgcn_test.py", line 116, in runner_func
    torch.cuda.set_device(gpu_id)
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1549630534704/work/torch/csrc/cuda/Module.cpp:34
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1549630534704/work/torch/csrc/cuda/Module.cpp line=34 error=10 : invalid device ordinal
Process SpawnProcess-3:
Traceback (most recent call last):
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/users/z/PGCN/pgcn_test.py", line 116, in runner_func
    torch.cuda.set_device(gpu_id)
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1549630534704/work/torch/csrc/cuda/Module.cpp:34
  0%|                                                   | 0/210 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1549630534704/work/torch/csrc/cuda/Module.cpp line=34 error=10 : invalid device ordinal
Process SpawnProcess-4:
Traceback (most recent call last):
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ubuntu/users/z/PGCN/pgcn_test.py", line 116, in runner_func
    torch.cuda.set_device(gpu_id)
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: cuda runtime error (10) : invalid device ordinal at /opt/conda/conda-bld/pytorch_1549630534704/work/torch/csrc/cuda/Module.cpp:34
  6%|██▍                                     | 13/210 [06:37<1:47:22, 32.70s/it]^CTraceback (most recent call last):
  File "/home/ubuntu/users/z/PGCN/pgcn_test.py", line 216, in <module>
    rst = result_queue.get()
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/queues.py", line 94, in get
    res = self._recv_bytes()
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/ubuntu/users/z/anaconda3/envs/pgcn/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
Process SpawnProcess-1:
  6%|██▍                                     | 13/210 [06:44<1:42:07, 31.10s/it]

Process finished with exit code 1

I also test my cuda and it turns out TRUE:

>>> import torch
>>> torch.cuda.is_available()
True

I do not know how to fix this error. Could anyone help?