Closed bernhardschaefer closed 4 years ago
According to https://github.com/NVIDIA/apex/issues/185#issuecomment-471240746 this is a bug in nvcc 9.0. Could you try whether the latest version 10e5bb0d8474f4eb3611edfe6160e8d82d74fc7e works around the issue?
I have met the same error with cuda9.0 and solved it with cuda10.1
Thanks for the fix in https://github.com/facebookresearch/detectron2/commit/10e5bb0d8474f4eb3611edfe6160e8d82d74fc7e. I still run into the issue with those changes. However, changing one more in #420 let's me finally build detectron2 successfully. :-)
I celebrated too early.
Tried running inference on the GPU and run into RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
.
Could this be related to the change in #420?
python demo/demo.py --config-file configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml --input ~/D056394/bird2.jpg ~/D056934/dogs.jpg --opts MODEL.WEIGHTS detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl
[11/28 09:47:49 detectron2]: Arguments: Namespace(confidence_threshold=0.5, config_file='configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml', input=['/home/ubuntu/D056394/bird2.jpg', '/home/ubuntu/D056934/dogs.jpg'], opts=['MODEL.WEIGHTS', 'detectron2://COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl'], output=None, video_input=None, webcam=False)
WARNING [11/28 09:47:49 d2.config.compat]: Config 'configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml' has no VERSION. Assuming it to be compatible with latest v2.
File "demo/demo.py", line 83, in <module>
predictions, visualized_output = demo.run_on_image(img)
File "/home/ubuntu/detectron2/demo/predictor.py", line 48, in run_on_image
predictions = self.predictor(image)
File "/home/ubuntu/miniconda/envs/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad
return func(*args, **kwargs)
File "/home/ubuntu/detectron2/detectron2/engine/defaults.py", line 188, in __call__
predictions = self.model([inputs])[0]
File "/home/ubuntu/miniconda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 66, in forward
return self.inference(batched_inputs)
File "/home/ubuntu/detectron2/detectron2/modeling/meta_arch/rcnn.py", line 115, in inference
features = self.backbone(images.tensor)
File "/home/ubuntu/miniconda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/detectron2/detectron2/modeling/backbone/fpn.py", line 122, in forward
bottom_up_features = self.bottom_up(x)
File "/home/ubuntu/miniconda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/detectron2/detectron2/modeling/backbone/resnet.py", line 381, in forward
x = self.stem(x)
File "/home/ubuntu/miniconda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/detectron2/detectron2/modeling/backbone/resnet.py", line 313, in forward
x = self.conv1(x)
File "/home/ubuntu/miniconda/envs/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
result = self.forward(*input, **kwargs)
File "/home/ubuntu/detectron2/detectron2/layers/wrappers.py", line 87, in forward
x = super().forward(x)
File "/home/ubuntu/miniconda/envs/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/home/ubuntu/miniconda/envs/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward
self.padding, self.dilation, self.groups)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
That usually means that your cudnn version is unsupported in your environment.
The cudnn version seems to match:
cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.4.1
PyTorch built with:
- CUDA Runtime 9.0
- CuDNN 7.4.1 (built against CUDA 10.0)
However, the information that CuDNN has been built against CUDA 10.0 seems to be wrong. I haven't found a PyTorch build option to change that.
Should I file that issue upstream in PyTorch?
It just means you installed a wrong version of cudnn. You should use a cudnn built against CUDA 9.0
closing as the original issue was addressed
How To Reproduce the Issue
I've attached the full install.log with 3436 lines. In the following you can find also find everything starting from line 1673, since the errors start with the invocation of nvcc:
Expected behavior
Since I did not find any official comment that CUDA 9.0 is not supported anymore, I expected that the installation should work. In #176 the issue was resolved by upgrading CUDA. Unfortunately I can't do that in my case.
Environment