Open Emma0928 opened 5 years ago
Caffe2 officially supports CUDA 9.0 and CUDA 8.0
I'd need help with this as well. I have just upgraded a machine instead of two 1070 GPUs in SLI to one RTX 2080 Ti and my config before worked with those two cards now it seems to throw the same error. It is not CUDA 10.0 support error since I've been running it before. I am also using cuDNN 7.5.1.0, please help since I've tried reinstalling CUDA, cuDNN and pytroch as well. I built pytroch from source again and it went smoothly, the process found CUDA and cuDNN as well and finished without errors. (Ubuntu 16.04
)
INFO net.py: 133: res2_0_branch2a_b preserved in workspace (unused)
INFO net.py: 133: res4_9_branch2c_b preserved in workspace (unused)
INFO net.py: 133: res4_7_branch2a_b preserved in workspace (unused)
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000110464 secs
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 9.2788e-05 secs
[I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 1.9226e-05 secs
INFO infer_simple.py: 147: Processing demo/24274813513_0cfd2ce6d0_k.jpg -> /tmp/detectron-visualizations/24274813513_0cfd2ce6d0_k.jpg.pdf
[I net_async_base.h:205] Using specified CPU pool size: 4; device id: -1
[I net_async_base.h:210] Created new CPU pool, size: 4; device id: -1
[E net_async_base.cc:377] [enforce fail at conv_op_cudnn.cc:811] status == CUDNN_STATUS_SUCCESS. 8 vs 0. , Error at: /pytorch/caffe2/operators/conv_op_cudnn.cc:811: CUDNN_STATUS_EXECUTION_FAILED
Error from operator:
input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "stride" i: 2 } arg { name: "order" s: "NCHW" } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f143fc74441 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x49 (0x7f143fc74259 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #2: <unknown function> + 0x161e4ff (0x7f13e427c4ff in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: <unknown function> + 0x1620424 (0x7f13e427e424 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x456 (0x7f13e4284216 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7f13e4271278 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: <unknown function> + 0x157d9f5 (0x7f13e41db9f5 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f1418f1a1f4 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: <unknown function> + 0x18e7669 (0x7f1418f20669 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x253 (0x7f143fc6e723 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #10: <unknown function> + 0xb8c80 (0x7f1445134c80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #11: <unknown function> + 0x76ba (0x7f144c18f6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x6d (0x7f144bec541d in /lib/x86_64-linux-gnu/libc.so.6)
, op Conv
[E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn'
WARNING workspace.py: 218: Original python traceback for operator `0` in network `generalized_rcnn` in exception above (most recent call last):
WARNING workspace.py: 223: File "tools/infer_simple.py", line 185, in <module>
WARNING workspace.py: 223: File "tools/infer_simple.py", line 135, in main
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/core/test_engine.py", line 327, in initialize_model_from_cfg
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/model_builder.py", line 124, in create
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/model_builder.py", line 89, in generalized_rcnn
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/optimizer.py", line 54, in build_data_parallel_model
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/model_builder.py", line 169, in _single_gpu_build_func
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/FPN.py", line 63, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/FPN.py", line 104, in add_fpn_onto_conv_body
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/ResNet.py", line 99, in add_ResNet_convX_body
WARNING workspace.py: 223: File "/home/qbeer666/Detectron/detectron/modeling/ResNet.py", line 252, in basic_bn_stem
WARNING workspace.py: 223: File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/cnn.py", line 97, in Conv
WARNING workspace.py: 223: File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/brew.py", line 108, in scope_wrapper
WARNING workspace.py: 223: File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/helpers/conv.py", line 186, in conv
WARNING workspace.py: 223: File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase
Traceback (most recent call last):
File "tools/infer_simple.py", line 185, in <module>
main(args)
File "tools/infer_simple.py", line 153, in main
model, im, None, timers=timers
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 111, in NamedCudaScope
yield
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 118, in GpuNameScope
yield
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/scope.py", line 48, in NameScope
yield
File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 118, in GpuNameScope
yield
File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 111, in NamedCudaScope
yield
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 126, in CudaScope
yield
File "/usr/lib/python3.5/contextlib.py", line 77, in __exit__
self.gen.throw(type, value, traceback)
File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/scope.py", line 82, in DeviceScope
yield
File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 126, in CudaScope
yield
File "/home/qbeer666/Detectron/detectron/utils/c2.py", line 111, in NamedCudaScope
yield
File "tools/infer_simple.py", line 153, in main
model, im, None, timers=timers
File "/home/qbeer666/Detectron/detectron/core/test.py", line 66, in im_detect_all
model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
File "/home/qbeer666/Detectron/detectron/core/test.py", line 158, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/workspace.py", line 250, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept
return func(*args, **kwargs)
RuntimeError: [enforce fail at conv_op_cudnn.cc:811] status == CUDNN_STATUS_SUCCESS. 8 vs 0. , Error at: /pytorch/caffe2/operators/conv_op_cudnn.cc:811: CUDNN_STATUS_EXECUTION_FAILED
Error from operator:
input: "gpu_0/data" input: "gpu_0/conv1_w" output: "gpu_0/conv1" name: "" type: "Conv" arg { name: "kernel" i: 7 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 3 } arg { name: "stride" i: 2 } arg { name: "order" s: "NCHW" } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7f143fc74441 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #1: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x49 (0x7f143fc74259 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #2: <unknown function> + 0x161e4ff (0x7f13e427c4ff in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #3: <unknown function> + 0x1620424 (0x7f13e427e424 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x456 (0x7f13e4284216 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7f13e4271278 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #6: <unknown function> + 0x157d9f5 (0x7f13e41db9f5 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)
frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f1418f1a1f4 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #8: <unknown function> + 0x18e7669 (0x7f1418f20669 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)
frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x253 (0x7f143fc6e723 in /home/qbeer666/.local/lib/python3.5/site-packages/caffe2/python/../../torch/lib/libc10.so)
frame #10: <unknown function> + 0xb8c80 (0x7f1445134c80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
frame #11: <unknown function> + 0x76ba (0x7f144c18f6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #12: clone + 0x6d (0x7f144bec541d in /lib/x86_64-linux-gnu/libc.so.6)
My computer environment : ======================================================= Ubuntu16.04 CUDA 10.0 protobuf==3.6.1 ======================================================= You can reinstall cudatoolkit 9.0, and pytorch-nightly-1.0.0.dev20190328-py2.7_cuda9.0.176_cudnn7.4.2_0.tar.bz2, it can run successfully .
@dzb1992 Hi there! Would you please share the download link of _pytorch-nightly-1.0.0.dev20190328-py2.7_cuda9.0.176_cudnn7.4.20.tar.bz2 please? ^ _ ^
@qbeer Hi there! Have you solved this problem? I meet it too!
Ubuntu18.04 CUDA 10.0 when I run infer_simple.py this error occurs. Does anyone know how to fix it?