facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.24k stars 5.45k forks source link

Error while running infer_simple.py for the first time #812

Open Gigibulid opened 5 years ago

Gigibulid commented 5 years ago

Even though I have found different solutions for this problem nothing seems to work for me...

Expected results

A pdf file with the visualizations of the detections

Actual results

Traceback (most recent call last): File "tools/infer_simple.py", line 185, in main(args) File "tools/infer_simple.py", line 153, in main model, im, None, timers=timers File "/home/gigi/detectron/detectron/core/test.py", line 66, in im_detect_all model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals File "/home/gigi/detectron/detectron/core/test.py", line 158, in im_detect_bbox workspace.RunNet(model.net.Proto().name) File "/home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/workspace.py", line 236, in RunNet StringifyNetName(name), num_iter, allow_fail, File "/home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/workspace.py", line 197, in CallWithExceptionIntercept return func(*args, *kwargs) RuntimeError: [enforce fail at context_gpu.cu:415] error == cudaSuccess. 2 vs 0. Error at: /opt/conda/conda-bld/pytorch_1544194558701/work/caffe2/core/context_gpu.cu:415: out of memory Error from operator: input: "gpu_0/res3_0_branch2a" input: "gpu_0/res3_0_branch2b_w" output: "gpu_0/res3_0_branch2b" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x59 (0x7fb7bdacb309 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: + 0x2a6945c (0x7fb7c074545c in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #2: + 0x13c7fd5 (0x7fb7bf0a3fd5 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: + 0x157a294 (0x7fb7bf256294 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x3d9 (0x7fb7bf2648a9 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::CudnnConvOp::RunOnDevice() + 0x1b0 (0x7fb7bf24e060 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #6: + 0x14d0955 (0x7fb7bf1ac955 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #7: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7fb7fcd3d324 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #8: + 0x118a6c2 (0x7fb7fcd446c2 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #9: c10::ThreadPool::main_loop(unsigned long) + 0x258 (0x7fb7fc0847e8 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #10: + 0xb8678 (0x7fb810285678 in /home/gigi/anaconda2/envs/caffe227/lib/python2.7/site-packages/../../libstdc++.so.6) frame #11: + 0x8184 (0x7fb8189ca184 in /lib/x86_64-linux-gnu/libpthread.so.0) frame #12: clone + 0x6d (0x7fb817fea03d in /lib/x86_64-linux-gnu/libc.so.6)

Detailed steps to reproduce

I ran the following command from: https://github.com/facebookresearch/Detectron/blob/master/GETTING_STARTED.md

python tools/infer_simple.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --output-dir /tmp/detectron-visualizations --image-ext jpg --wts https://dl.fbaipublicfiles.com/detectron/35861858/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml.02_32_51.SgT4y1cO/output/train/coco_2014_train:coco_2014_valminusminival/generalized_rcnn/model_final.pkl demo

System information

Other info

When I run: python /tests/test_spatial_narrow_as_op.py I get: [E init_intrinsics_check.cc:43] CPU feature avx is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. [E init_intrinsics_check.cc:43] CPU feature avx2 is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. [E init_intrinsics_check.cc:43] CPU feature fma is present on your machine, but the Caffe2 binary is not compiled with it. It means you may not get the full speed of your CPU. Found Detectron ops lib: /home/afroditi/anaconda2/envs/caffe227/lib/python2.7/site-packages/torch/lib/libcaffe2_detectron_ops_gpu.so ...

Ran 3 tests in 2.810s

OK

jungaria commented 5 years ago

I have the same problems, so i cannot go any further with detectron. I also need any comments.

Thanks

jungaria commented 5 years ago

@Gigibulid

Hey, try to change the values as below

When i ran infer_simple.py with options, --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-101-FPN_2x.yaml --output-dir ./test/imageDetection --image-ext jpg --wts trainedWeights/model_final.pkl demo, i got "out_of memor" error.

i changed SCALES/SCALE values in e2e_mask_rcnn_R-101-FPN_2x.yaml from 800 to 300 ( actually 700 made same error ) and then it worked.

pratikbhave2 commented 5 years ago

Hi @Gigibulid I was facing the same error and @jungaria 's solution worked. Thanks! @jungaria