Trouble running inference.sh (no CUDA-capable device detected)

pgavriel commented 4 years ago

Hello, I'm simply trying to run the inference.sh script with some of my own pictures and models, but when I do so I get this output:

Color management: using fallback mode for management Namespace(bin_size=15, image_path='data/TaskBoard/img', img_feature_dim=1024, input_dim=224, model='model/ObjectNet3D.pth', obj_path='data/TaskBoard/obj/GearLarge.obj', render_path='data/TaskBoard/mv_gearlarge/crop', shape='MultiView', shape_feature_dim=256, tour=2, view_num=12) THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1549627089062/work/aten/src/THC/THCGeneral.cpp line=51 error=38 : no CUDA-capable device is detected Traceback (most recent call last): File "inference.py", line 45, in model.cuda() File "/home/pgavriel/anaconda3/envs/PoseFromShape/lib/python3.6/site-packages/torch/nn/modules/module.py", line 260, in cuda return self._apply(lambda t: t.cuda(device)) File "/home/pgavriel/anaconda3/envs/PoseFromShape/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply module._apply(fn) File "/home/pgavriel/anaconda3/envs/PoseFromShape/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply module._apply(fn) File "/home/pgavriel/anaconda3/envs/PoseFromShape/lib/python3.6/site-packages/torch/nn/modules/module.py", line 187, in _apply module._apply(fn) File "/home/pgavriel/anaconda3/envs/PoseFromShape/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply param.data = fn(param.data) File "/home/pgavriel/anaconda3/envs/PoseFromShape/lib/python3.6/site-packages/torch/nn/modules/module.py", line 260, in return self._apply(lambda t: t.cuda(device)) File "/home/pgavriel/anaconda3/envs/PoseFromShape/lib/python3.6/site-packages/torch/cuda/init.py", line 162, in _lazy_init torch._C._cuda_init() RuntimeError: cuda runtime error (38) : no CUDA-capable device is detected at /opt/conda/conda-bld/pytorch_1549627089062/work/aten/src/THC/THCGeneral.cpp:51 Error: Not freed memory blocks: 8, total unfreed memory 0.008392 MB

What I don't understand is, my anaconda installation is in my home directory, and there is no /opt/conda/ folder on my machine, but I don't see where it's being told to look under /opt/conda/ for CUDA devices. Do I need to build CUDA drivers from source? Did I configure something wrong? Any help would be greatly appreciated. Thanks!

YoungXIAO13 commented 4 years ago

Hi, I think you should

1) check that the cuda is well installed in your machine as similar issue reported here

2) check that your conda path is correctly set as your installed path as someone mentioned here

Hope that would help you

pgavriel commented 4 years ago

Thank you for the tips! I went through the steps to verify my CUDA installation, and I was able to confirm that the 'CUDA_VISIBLE_DEVICES=1' bit in the inference.sh was the issue, and setting it to 0 fixed it. Another issue occured where it was failing to import torch, but I remade the conda environment following the instructions and the script seems to start properly but I get this error:

Color management: using fallback mode for management Namespace(bin_size=15, image_path='data/TaskBoard/img', img_feature_dim=1024, input_dim=224, model='model/ObjectNet3D.pth', obj_path='data/TaskBoard/obj/GearLarge.obj', render_path='data/TaskBoard/mv_gearlarge/crop', shape='MultiView', shape_feature_dim=256, tour=2, view_num=12) Previous weight loaded 0%| | 0/6 [00:00<?, ?it/s] Traceback (most recent call last): File "inference.py", line 106, in ele = (((pred_ele.float() + out_reg[1]) * (360. / opt.azi_classes)) - 90).item() AttributeError: 'Namespace' object has no attribute 'azi_classes' Error: Not freed memory blocks: 8, total unfreed memory 0.008392 MB

And after reading through the code, I'm wondering are those lines correct? Because it doesn't seem like opt has those attributes (Lines 105-107). Is it supposed to be model.azi_classes? Or just azi_classes? Or is something else going wrong here? Thanks!

YoungXIAO13 commented 4 years ago

Yes, that's an error in the namespace. I've corrected it and you can check the new code. You can also simply replace the "(360. / opt.azi_classes)" by "opt.bin_size"), which should work.

pgavriel commented 4 years ago

Thank you for the assistance!

YoungXIAO13 / PoseFromShape

Trouble running inference.sh (no CUDA-capable device detected) #18