Closed HsienYangLiao closed 2 years ago
Hello HsienYang,
Thanks for your interest in our work! The problem seems to be related to the cuDNN library. Please make sure CUDA and cuDNN are correctly installed. My CUDA version is 10.1. A newer CUDA version (e.g. v11.0) should probably work too, but I haven't tested it yet. I also have cuDNN errors occasionally due to insufficient GPU memory. If this is the case, a smaller batch size should help.
Hello chensong1995, I found the problem due to the GeForce RTX 3090 can only works in a newer CUDA version (e.g. v11.1). I tried to install the env with :
conda create -y --name hybridpose python==3.7.4
conda install -y -q --name hybridpose -c pytorch -c anaconda -c conda-forge -c pypi --file requirements.txt
And the requirements.text info : pillow>=6.2.2 pytorch==1.12.0 torchvision==0.13.0 cudatoolkit==11.3.1 setuptools==61.2.0 scikit-learn==1.0.2
Due to there is no available opencv package from current channels,
I try to download it using pip install opencv-python
and it works.
but when I try with python setup.py build_ext --inplace
under the path ~/4Tdata/HybridPose/lib/ransac_voting_gpu_layer
It still had an error with error: command '/usr/bin/gcc' failed with exit code 1
Is there any mistake I had make? I was wondering if you can help.
Best regards,
Hello HsienYang,
Looks like the error is from the CUDA compiler or the C compiler. Can you follow https://cuda-tutorial.readthedocs.io/en/latest/tutorials/tutorial01/ and see if you can use nvcc
to compile the hello world program? This will tell if your CUDA installation is correct. I hope it helps!
Thanks chensong1995,
I try with pip install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=11.1 -c pytorch -c conda-forge
and it works.
And also pytorch==1.11.0/1.12.0 can't work.
Thanks for your help!
Sincerely,
Hello, When I run with
python src/train_core.py
I got the error below.number of model parameters: 12959563 data/blender_linemod/ape/787.jpg data/blender_linemod/ape/7168.jpg data/blender_linemod/ape/8605.jpg data/blender_linemod/ape/8994.jpg data/blender_linemod/ape/5662.jpg data/blender_linemod/ape/6565.jpg data/blender_linemod/ape/9001.jpg data/blender_linemod/ape/9097.jpg data/blender_linemod/ape/2749.jpg data/blender_linemod/ape/8949.jpg data/blender_linemod/ape/6457.jpg data/blender_linemod/ape/1797.jpg data/blender_linemod/ape/6732.jpg data/blender_linemod/ape/955.jpg data/blender_linemod/ape/3238.jpg data/blender_linemod/ape/2563.jpg data/blender_linemod/ape/997.jpg data/blender_linemod/ape/41.jpg data/blender_linemod/ape/6435.jpg data/blender_linemod/ape/5518.jpg data/blender_linemod/ape/1298.jpg data/blender_linemod/ape/9021.jpg data/blender_linemod/ape/3292.jpg data/blender_linemod/ape/353.jpg data/blender_linemod/ape/839.jpg data/blender_linemod/ape/3755.jpg data/blender_linemod/ape/2756.jpg data/blender_linemod/ape/8147.jpg data/blender_linemod/ape/985.jpg data/blender_linemod/ape/9282.jpg data/blender_linemod/ape/6085.jpg data/blender_linemod/ape/9571.jpg data/blender_linemod/ape/6201.jpg data/blender_linemod/ape/3131.jpg data/blender_linemod/ape/454.jpg data/blender_linemod/ape/6672.jpg data/blender_linemod/ape/974.jpg data/blender_linemod/ape/8552.jpg data/blender_linemod/ape/4162.jpg data/blender_linemod/ape/7323.jpg data/blender_linemod/ape/8106.jpg data/blender_linemod/ape/1371.jpg data/blender_linemod/ape/2015.jpg data/blender_linemod/ape/1836.jpg data/blender_linemod/ape/1304.jpg data/blender_linemod/ape/4561.jpg data/blender_linemod/ape/8710.jpg data/blender_linemod/ape/8001.jpg data/blender_linemod/ape/3660.jpg data/blender_linemod/ape/848.jpg data/blender_linemod/ape/2591.jpg data/blender_linemod/ape/9352.jpg data/blender_linemod/ape/2967.jpg data/blender_linemod/ape/8960.jpg data/blender_linemod/ape/1250.jpg data/blender_linemod/ape/7439.jpg data/blender_linemod/ape/7096.jpg data/blender_linemod/ape/4884.jpg data/blender_linemod/ape/9769.jpg data/blender_linemod/ape/6265.jpg data/blender_linemod/ape/3907.jpg data/blender_linemod/ape/9357.jpg data/blender_linemod/ape/5254.jpg data/blender_linemod/ape/5369.jpg data/blender_linemod/ape/657.jpg data/blender_linemod/ape/3719.jpg data/blender_linemod/ape/9614.jpg data/blender_linemod/ape/7295.jpg data/blender_linemod/ape/4136.jpg data/blender_linemod/ape/3505.jpg data/blender_linemod/ape/4203.jpg data/blender_linemod/ape/8658.jpg data/blender_linemod/ape/6839.jpg data/blender_linemod/ape/9246.jpg data/blender_linemod/ape/4042.jpg data/blender_linemod/ape/8570.jpg data/blender_linemod/ape/3492.jpg data/blender_linemod/ape/8141.jpg data/blender_linemod/ape/4222.jpg data/blender_linemod/ape/473.jpg data/blender_linemod/ape/9366.jpg Traceback (most recent call last): File "src/train_core.py", line 106, in
trainer.train(epoch)
File "./trainers/coretrainer.py", line 45, in train
self.model(batch['image'], batch['sym_cor'], batch['mask'], batch['pts2d_map'], batch['graph'])
File "/home/asd/anaconda3/envs/hybridpose1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, kwargs)
File "/home/asd/anaconda3/envs/hybridpose1/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], *kwargs[0])
File "/home/asd/anaconda3/envs/hybridpose1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(input, kwargs)
File "/home/asd/4Tdata/HybridPose/lib/model_repository.py", line 80, in forward
x2s, x4s, x8s, x16s, x32s, xfc = self.resnet18_8s(image)
File "/home/asd/anaconda3/envs/hybridpose1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, *kwargs)
File "/home/asd/4Tdata/HybridPose/lib/resnet.py", line 201, in forward
x = self.bn1(x)
File "/home/asd/anaconda3/envs/hybridpose1/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(input, **kwargs)
File "/home/asd/anaconda3/envs/hybridpose1/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 81, in forward
exponential_average_factor, self.eps)
File "/home/asd/anaconda3/envs/hybridpose1/lib/python3.7/site-packages/torch/nn/functional.py", line 1656, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Do you have any idea with it?