LuoweiZhou / detectron-vlp

Detectron for image/video region feature extraction, inspired by Xinlei's repo
21 stars 5 forks source link

Errors with processing Flickr30 #9

Open pruksmhc opened 3 years ago

pruksmhc commented 3 years ago

Hi Luowei, I am trying to get extract_feat_flickr30k to work, and am having issues, even after following https://github.com/LuoweiZhou/detectron-vlp/issues/2

The traceback is below.

Traceback (most recent call last): Found Detectron ops lib: /home/ubuntu/detectron-vlp/lib/libcaffe2_detectron_ops_gpu.so File "tools/extract_features.py", line 59, in import core.test_engine as infer_engine File "/home/ubuntu/detectron-vlp/lib/core/test_engine.py", line 36, in from core.rpn_generator import generate_rpn_on_dataset File "/home/ubuntu/detectron-vlp/lib/core/rpn_generator.py", line 44, in from modeling import model_builder File "/home/ubuntu/detectron-vlp/lib/modeling/model_builder.py", line 46, in from detectron.modeling.detector import DetectionModelHelper File "/home/ubuntu/detectron/detectron/modeling/detector.py", line 33, in from detectron.ops.collect_and_distribute_fpn_rpn_proposals \ File "/home/ubuntu/detectron/detectron/ops/collect_and_distribute_fpn_rpn_proposals.py", line 24, in from detectron.datasets import json_dataset File "/home/ubuntu/detectron/detectron/datasets/json_dataset.py", line 44, in import detectron.utils.boxes as box_utils File "/home/ubuntu/detectron/detectron/utils/boxes.py", line 51, in import detectron.utils.cython_bbox as cython_bbox AttributeError: module 'detectron.utils' has no attribute 'cython_bbox

What might be going on? Thank you!

pruksmhc commented 3 years ago

This was resolved by adding /path/to/detectron/detectron to PYTHONPATH.

pruksmhc commented 3 years ago

Actually, after fixing the above, I get the below error.

FO net.py: 133: pred_w preserved in workspace (unused) [I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000101622 secs [I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000104455 secs [I net_async_base.h:205] Using specified CPU pool size: 4; device id: -1 [I net_async_base.h:210] Created new CPU pool, size: 4; device id: -1 [E net_async_base.cc:377] [enforce fail at conv_opcudnn.cc:554] filter.dim32(1) == C / group. 4 vs 256 Error from operator: input: "gpu_0/res2_0_branch2a" input: "gpu_0/res2_0_branch2b_w" output: "gpu_0/res2_0_branch2b" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } arg { name: "pad" i: 1 } arg { name: "dilation" i: 1 } arg { name: "exhaustive_search" i: 0 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fe45122a441 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x49 (0x7fe45122a259 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #2: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x2117 (0x7fe452c90ed7 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7fe452c7c278 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: + 0x157d9f5 (0x7fe452be69f5 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7fe4879251f4 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: + 0x18e7669 (0x7fe48792b669 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x253 (0x7fe451224723 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #8: + 0xc8421 (0x7fe4a4fc8421 in /home/ubuntu/miniconda3/envs/vlp/bin/../lib/libstdc++.so.6) frame #9: + 0x76db (0x7fe4aba076db in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x3f (0x7fe4ab73071f in /lib/x86_64-linux-gnu/libc.so.6) , op Conv [E net_async_base.cc:129] Rethrowing exception from the run of 'model' /home/ubuntu/flickr30k_images/images/3562169000.jpg WARNING workspace.py: 218: Original python traceback for operator 7 in network model in exception above (most recent call last): WARNING workspace.py: 223: File "tools/extract_features.py", line 287, in WARNING workspace.py: 223: File "tools/extract_features.py", line 231, in main WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 417, in initialize_model_from_cfg WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/model_builder.py", line 127, in create WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/model_builder.py", line 91, in generalized_rcnn WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/model_builder.py", line 233, in build_generic_detection_model WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/optimizer.py", line 54, in build_data_parallel_model WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/model_builder.py", line 173, in _single_gpu_build_func WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/FPN.py", line 63, in add_fpn_ResNet101_conv5_body WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/FPN.py", line 104, in add_fpn_onto_conv_body WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 103, in add_ResNet_convX_body WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 85, in add_stage WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 183, in add_residual_block WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 316, in bottleneck_transformation WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/detector.py", line 438, in ConvAffine WARNING workspace.py: 223: File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/cnn.py", line 97, in Conv WARNING workspace.py: 223: File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/brew.py", line 108, in scope_wrapper WARNING workspace.py: 223: File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/helpers/conv.py", line 186, in conv WARNING workspace.py: 223: File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase Traceback (most recent call last): File "tools/extract_features.py", line 287, in main(args) File "tools/extract_features.py", line 267, in main args.min_bboxes, args.max_bboxes) File "tools/extract_features.py", line 171, in get_detections_from_im scores, cls_boxes, im_scale = infer_engine.im_detect_bbox(model, im,TEST_SCALE, TEST_MAX_SIZE, boxes=bboxes) File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 365, in im_detect_bbox workspace.RunNet(model.net.Proto().name) File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/workspace.py", line 250, in RunNet StringifyNetName(name), num_iter, allow_fail, File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept return func(args, *kwargs) RuntimeError: [enforce fail at conv_opcudnn.cc:554] filter.dim32(1) == C / group. 4 vs 256 Error from operator: input: "gpu_0/res2_0_branch2a" input: "gpu_0/res2_0_branch2b_w" output: "gpu_0/res2_0_branch2b" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } arg { name: "pad" i: 1 } arg { name: "dilation" i: 1 } arg { name: "exhaustive_search" i: 0 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fe45122a441 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x49 (0x7fe45122a259 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #2: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x2117 (0x7fe452c90ed7 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7fe452c7c278 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: + 0x157d9f5 (0x7fe452be69f5 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7fe4879251f4 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: + 0x18e7669 (0x7fe48792b669 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x253 (0x7fe451224723 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #8: + 0xc8421 (0x7fe4a4fc8421 in /home/ubuntu/miniconda3/envs/vlp/bin/../lib/libstdc++.so.6) frame #9: + 0x76db (0x7fe4aba076db in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x3f (0x7fe4ab73071f in /lib/x86_64-linux-gnu/libc.so.6)

Is the model checkpoint incompatible in some way?

MarcusNerva commented 3 years ago

Actually, after fixing the above, I get the below error.

FO net.py: 133: pred_w preserved in workspace (unused) [I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000101622 secs [I net_dag_utils.cc:102] Operator graph pruning prior to chain compute took: 0.000104455 secs [I net_async_base.h:205] Using specified CPU pool size: 4; device id: -1 [I net_async_base.h:210] Created new CPU pool, size: 4; device id: -1 [E net_async_base.cc:377] [enforce fail at conv_opcudnn.cc:554] filter.dim32(1) == C / group. 4 vs 256 Error from operator: input: "gpu_0/res2_0_branch2a" input: "gpu_0/res2_0_branch2b_w" output: "gpu_0/res2_0_branch2b" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } arg { name: "pad" i: 1 } arg { name: "dilation" i: 1 } arg { name: "exhaustive_search" i: 0 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fe45122a441 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x49 (0x7fe45122a259 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #2: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x2117 (0x7fe452c90ed7 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7fe452c7c278 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: + 0x157d9f5 (0x7fe452be69f5 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7fe4879251f4 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: + 0x18e7669 (0x7fe48792b669 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x253 (0x7fe451224723 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #8: + 0xc8421 (0x7fe4a4fc8421 in /home/ubuntu/miniconda3/envs/vlp/bin/../lib/libstdc++.so.6) frame #9: + 0x76db (0x7fe4aba076db in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x3f (0x7fe4ab73071f in /lib/x86_64-linux-gnu/libc.so.6) , op Conv [E net_async_base.cc:129] Rethrowing exception from the run of 'model' /home/ubuntu/flickr30k_images/images/3562169000.jpg WARNING workspace.py: 218: Original python traceback for operator 7 in network model in exception above (most recent call last): WARNING workspace.py: 223: File "tools/extract_features.py", line 287, in WARNING workspace.py: 223: File "tools/extract_features.py", line 231, in main WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 417, in initialize_model_from_cfg WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/model_builder.py", line 127, in create WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/model_builder.py", line 91, in generalized_rcnn WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/model_builder.py", line 233, in build_generic_detection_model WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/optimizer.py", line 54, in build_data_parallel_model WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/model_builder.py", line 173, in _single_gpu_build_func WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/FPN.py", line 63, in add_fpn_ResNet101_conv5_body WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/FPN.py", line 104, in add_fpn_onto_conv_body WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 48, in add_ResNet101_conv5_body WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 103, in add_ResNet_convX_body WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 85, in add_stage WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 183, in add_residual_block WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/ResNet.py", line 316, in bottleneck_transformation WARNING workspace.py: 223: File "/home/ubuntu/detectron/detectron/modeling/detector.py", line 438, in ConvAffine WARNING workspace.py: 223: File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/cnn.py", line 97, in Conv WARNING workspace.py: 223: File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/brew.py", line 108, in scope_wrapper WARNING workspace.py: 223: File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/helpers/conv.py", line 186, in conv WARNING workspace.py: 223: File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase Traceback (most recent call last): File "tools/extract_features.py", line 287, in main(args) File "tools/extract_features.py", line 267, in main args.min_bboxes, args.max_bboxes) File "tools/extract_features.py", line 171, in get_detections_from_im scores, cls_boxes, im_scale = infer_engine.im_detect_bbox(model, im,TEST_SCALE, TEST_MAX_SIZE, boxes=bboxes) File "/home/ubuntu/detectron/detectron/core/test_engine.py", line 365, in im_detect_bbox workspace.RunNet(model.net.Proto().name) File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/workspace.py", line 250, in RunNet StringifyNetName(name), num_iter, allow_fail, File "/home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/workspace.py", line 211, in CallWithExceptionIntercept return func(args, *kwargs) RuntimeError: [enforce fail at conv_opcudnn.cc:554] filter.dim32(1) == C / group. 4 vs 256 Error from operator: input: "gpu_0/res2_0_branch2a" input: "gpu_0/res2_0_branch2b_w" output: "gpu_0/res2_0_branch2b" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "stride" i: 1 } arg { name: "pad" i: 1 } arg { name: "dilation" i: 1 } arg { name: "exhaustive_search" i: 0 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN"frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7fe45122a441 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: c10::ThrowEnforceNotMet(char const, int, char const, std::string const&, void const) + 0x49 (0x7fe45122a259 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #2: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x2117 (0x7fe452c90ed7 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #3: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7fe452c7c278 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: + 0x157d9f5 (0x7fe452be69f5 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7fe4879251f4 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #6: + 0x18e7669 (0x7fe48792b669 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x253 (0x7fe451224723 in /home/ubuntu/miniconda3/envs/vlp/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #8: + 0xc8421 (0x7fe4a4fc8421 in /home/ubuntu/miniconda3/envs/vlp/bin/../lib/libstdc++.so.6) frame #9: + 0x76db (0x7fe4aba076db in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x3f (0x7fe4ab73071f in /lib/x86_64-linux-gnu/libc.so.6)

Is the model checkpoint incompatible in some way?

@pruksmhc Hi there, have you solved this problem?