Facing context_gpu.cu:415: out of memory while running inference

SriRamGovardhanam commented 4 years ago

Hello While running inference cd DensePose && python2 tools/infer_simple.py \ --cfg configs/DensePose_ResNet50_FPN_s1x-e2e.yaml \ --output-dir DensePoseData/infer_out/ \ --image-ext jpg \ --wts https://dl.fbaipublicfiles.com/densepose/DensePose_ResNet50_FPN_s1x-e2e.pkl \ DensePoseData/demo_data/demo_im.jpg

I got this error

`Found Detectron ops lib: /home/sriram/anaconda2/lib/libcaffe2_detectron_ops_gpu.so

[E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.

[E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU.

[E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. WARNING cnn.py: 25: [====DEPRECATE WARNING====]: you are creating an object from CNNModelHelper class which will be deprecated soon. Please use ModelHelper object with brew module. For more information, please refer to caffe2.ai and python/brew.py, python/brew_test.py for more information. INFO net.py: 51: Loading weights from: /tmp/detectron-download-cache/DensePose_ResNet50_FPN_s1x-e2e.pkl [I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 5.0165e-05 secs [I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 4.2673e-05 secs [I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 1.1072e-05 secs INFO infer_simple.py: 103: Processing DensePoseData/demo_data/demo_im.jpg -> DensePoseData/infer_out/demo_im.jpg.pdf [I net_async_base.h:211] Using specified CPU pool size: 4; device id: -1 [I net_async_base.h:216] Created new CPU pool, size: 4; device id: -1 [E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory

Error from operator: input: "gpu_0/res2_2_sum" input: "gpu_0/fpn_inner_res2_2_sum_lateral_w" input: "gpu_0/fpn_inner_res2_2_sum_lateral_b" output: "gpu_0/fpn_inner_res2_2_sum_lateral" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const*) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so)

frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #3: + 0x145bb94 (0x7f9994313b94 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3d9 (0x7f9994320dc9 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1b0 (0x7f9994308100 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #6: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so)

frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #8: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so)

frame #10: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6) frame #11: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0)

frame #12: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6) , op Conv [E net_async_base.cc:377] [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory

Error from operator: input: "gpu_0/res3_0_branch2c" input: "gpu_0/res3_0_branch2c_bn_s" input: "gpu_0/res3_0_branch2c_bn_b" output: "gpu_0/res3_0_branch2c_bn" name: "" type: "AffineChannel" device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const*) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: + 0x2afcd73 (0x7f99959b4d73 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #8: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6) frame #9: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6) , op AffineChannel [E net_async_base.cc:129] Rethrowing exception from the run of 'generalized_rcnn' WARNING workspace.py: 204: Original python traceback for operator 44 in network generalized_rcnn in exception above (most recent call last): WARNING workspace.py: 209: File "tools/infer_simple.py", line 140, in WARNING workspace.py: 209: File "tools/infer_simple.py", line 91, in main WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/core/test_engine.py", line 334, in initialize_model_from_cfg WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 119, in create WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 84, in generalized_rcnn WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 233, in build_generic_detection_model WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/optimizer.py", line 46, in build_data_parallel_model WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/model_builder.py", line 165, in _single_gpu_build_func WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/FPN.py", line 40, in add_fpn_ResNet50_conv5_body WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/FPN.py", line 96, in add_fpn_onto_conv_body WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 32, in add_ResNet50_conv5_body WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 98, in add_ResNet_convX_body WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 77, in add_stage WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 174, in add_residual_block WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/ResNet.py", line 322, in bottleneck_transformation WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/detector.py", line 402, in ConvAffine WARNING workspace.py: 209: File "/home/sriram/DensePose/detectron/modeling/detector.py", line 97, in AffineChannel

Traceback (most recent call last): File "tools/infer_simple.py", line 140, in main(args) File "tools/infer_simple.py", line 109, in main model, im, None, timers=timers File "/home/sriram/DensePose/detectron/core/test.py", line 58, in im_detect_all model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals File "/home/sriram/DensePose/detectron/core/test.py", line 158, in im_detect_bbox workspace.RunNet(model.net.Proto().name) File "/home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 236, in RunNet StringifyNetName(name), num_iter, allow_fail, File "/home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/workspace.py", line 197, in CallWithExceptionIntercept return func(*args, *kwargs) RuntimeError: [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1549617926868/work/caffe2/core/context_gpu.cu:415: out of memory Error from operator: input: "gpu_0/res2_2_sum" input: "gpu_0/fpn_inner_res2_2_sum_lateral_w" input: "gpu_0/fpn_inner_res2_2_sum_lateral_b" output: "gpu_0/fpn_inner_res2_2_sum_lateral" name: "" type: "Conv" arg { name: "kernel" i: 1 } arg { name: "exhaustive_search" i: 0 } arg { name: "pad" i: 0 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x59 (0x7f9992a2a339 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: + 0x29581fc (0x7f99958101fc in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #2: + 0x12a6095 (0x7f999415e095 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: + 0x145bb94 (0x7f9994313b94 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3d9 (0x7f9994320dc9 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1b0 (0x7f9994308100 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #6: + 0x13ae835 (0x7f9994266835 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f99b8480a24 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #8: + 0x1493dc2 (0x7f99b8487dc2 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7f99b75cf6f8 in /home/sriram/anaconda2/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #10: + 0xb8678 (0x7f99c8e48678 in /home/sriram/anaconda2/bin/../lib/libstdc++.so.6) frame #11: + 0x76db (0x7f99cff886db in /lib/x86_64-linux-gnu/libpthread.so.0) frame #12: clone + 0x3f (0x7f99cf50c88f in /lib/x86_64-linux-gnu/libc.so.6) `

also i ran the test_spatial_narrow_as_op.py file successfully, but when i tried to run test_zero_even_op.py i got this OSError OSError: /home/sriram/DensePose/build/libcaffe2_detectron_custom_ops_gpu.so: undefined symbol: _ZN6caffe219CPUOperatorRegistryB5cxx11Ev Somewhere i have seen that it will only work on gcc-4.9.2 but when i tried in colab there the default gcc version is 7.5.0 , mine with the same 7.5.0 didnt solve the issue

I heard somewhere that ResNet50 will work fine for inference if vram is > 4GB and < 2GB, didnt understand out of memory error

Please someone help me out, i am struggling to make it from the last 7 days

vkhalidov commented 4 years ago

Hello @SriRamGovardhanam,

For what concerns GPU OOM, you can track memory usage using nvidia-smi; in particular, please verify that no process occupies GPU memory prior to launching the inference; also, reducing input image size would decrease the amount of required GPU memory;
Please use ldd to track dependencies of libcaffe2_detectron_custom_ops_gpu.so and nm to verify signatures;
Finally, DensePose is now available as a project inside Detectron2. It's based on PyTorch, it is faster and requires less memory. You might want to give it a try

SriRamGovardhanam commented 4 years ago

Hello @SriRamGovardhanam,

For what concerns GPU OOM, you can track memory usage using nvidia-smi; in particular, please verify that no process occupies GPU memory prior to launching the inference; also, reducing input image size would decrease the amount of required GPU memory;

Please use ldd to track dependencies of libcaffe2_detectron_custom_ops_gpu.so and nm to verify signatures;

Finally, DensePose is now available as a project inside Detectron2. It's based on PyTorch, it is faster and requires less memory. You might want to give it a try

Hey, thank you so much for responding Anyway i have checked 1, 2 points you mentioned which are well and good still the same output, even after resizing the image . I think maybe if there is no other possibility then i should go with 3rd point.

facebookresearch / DensePose

Facing context_gpu.cu:415: out of memory while running inference #269