AaronJackson / vrn

:man: Code for "Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression"
http://aaronsplace.co.uk/papers/jackson2017recon/
MIT License
4.52k stars 742 forks source link

cuda runtime error when execute "run.sh" #30

Closed DeqiangXiao closed 6 years ago

DeqiangXiao commented 6 years ago

Hi Aaron, I downloaded your github code and successfully configured its dependences on a linux server of our lab (Ubuntu 16.04.3 LTS, 4.4.0-98-generic; Python 2.7.12; Torch7; CUDA 8.0.61; cudnn 5.1; GPU: NVIDIA TITAN Xp 12GB), but when i try to execute "run.sh", a cuda runtime error occurs:

THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-9198/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory /home/deqiang/Toolkits/Torch/install/bin/luajit: ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:11: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-9198/cutorch/lib/THC/generic/THCStorage.cu:66 stack traceback: [C]: in function 'resize' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:11: in function 'torch_Storage_type' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:57: in function 'recursiveType' ...qiang/Toolkits/Torch/install/share/lua/5.1/nn/Module.lua:160: in function 'type' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:45: in function 'recursiveType' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:41: in function 'recursiveType' ...qiang/Toolkits/Torch/install/share/lua/5.1/nn/Module.lua:160: in function 'type' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:45: in function 'recursiveType' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:41: in function 'recursiveType' ...qiang/Toolkits/Torch/install/share/lua/5.1/nn/Module.lua:160: in function 'type' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:45: in function 'recursiveType' ... ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:45: in function 'recursiveType' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:41: in function 'recursiveType' ...qiang/Toolkits/Torch/install/share/lua/5.1/nn/Module.lua:160: in function 'type' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:45: in function 'recursiveType' ...eqiang/Toolkits/Torch/install/share/lua/5.1/nn/utils.lua:41: in function 'recursiveType' ...qiang/Toolkits/Torch/install/share/lua/5.1/nn/Module.lua:160: in function 'cuda' process.lua:18: in main chunk [C]: in function 'dofile' ...kits/Torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk [C]: at 0x00405d50 ls: cannot access '*.raw': No such file or directory

I tried several solutions from Google, but doesn't work. Any help is appreciated!

AaronJackson commented 6 years ago

You have run out of memory on your GPU. What GPU are you using?

AaronJackson commented 6 years ago

Oh, titan XP. That should have enough memory...

AaronJackson commented 6 years ago

Can you run nvidia-smi to see if you have anything else allocated on the GPU?

DeqiangXiao commented 6 years ago

Thanks for your quick reply @AaronJackson Results for run nvidia-smi: Wed Nov 29 11:06:08 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 381.09 Driver Version: 381.09 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 TITAN Xp Off | 0000:04:00.0 Off | N/A | | 46% 73C P2 210W / 250W | 11132MiB / 12189MiB | 95% Default | +-------------------------------+----------------------+----------------------+ | 1 TITAN Xp Off | 0000:06:00.0 Off | N/A | | 42% 67C P2 103W / 250W | 9937MiB / 12189MiB | 77% Default | +-------------------------------+----------------------+----------------------+ | 2 TITAN Xp Off | 0000:07:00.0 Off | N/A | | 53% 83C P2 172W / 250W | 11527MiB / 12189MiB | 90% Default | +-------------------------------+----------------------+----------------------+

AaronJackson commented 6 years ago

All of your memory is allocated. I guess this is a shared machine?

DeqiangXiao commented 6 years ago

Yes, this server is shared by a group. So the problem is caused by the shortage of GPU memory?

AaronJackson commented 6 years ago

Yeah the machine is quite clearly too busy. You need to wait for some memory to become available or ask people to stop using all the GPUs. I cannot help you with this part.

DeqiangXiao commented 6 years ago

OK. Thanks very much!