facebookresearch / Detectron

FAIR's research platform for object detection research, implementing popular algorithms like Mask R-CNN and RetinaNet.
Apache License 2.0
26.22k stars 5.45k forks source link

Training on 6 Keypoints: PythonOp function: AssertionError #926

Open HansMertens opened 5 years ago

HansMertens commented 5 years ago

Hi there,

i tried to train e2e_keypoint_rcnn_R-101-FPN_s1x on just six Keypoints of the COCO Dataset. I am receiving the following Errors. Does anyone have an advice on how to solve these Errors?

Any help is very much appreciated.

[E pybind_state.h:424] Exception encountered running PythonOp function: AssertionError:

At: /home/hans/Detectron/detectron/utils/keypoints.py(169): keypoints_to_heatmap_labels /home/hans/Detectron/detectron/roi_data/keypoint_rcnn.py(76): add_keypoint_rcnn_blobs /home/hans/Detectron/detectron/roi_data/fast_rcnn.py(203): _sample_rois /home/hans/Detectron/detectron/roi_data/fast_rcnn.py(112): add_fast_rcnn_blobs /home/hans/Detectron/detectron/ops/collect_and_distribute_fpn_rpn_proposals.py(62): forward

[E net_async_base.cc:382] [enforce fail at pybind_state.h:425] . Exception encountered running PythonOp function: AssertionError:

At: /home/hans/Detectron/detectron/utils/keypoints.py(169): keypoints_to_heatmap_labels /home/hans/Detectron/detectron/roi_data/keypoint_rcnn.py(76): add_keypoint_rcnn_blobs /home/hans/Detectron/detectron/roi_data/fast_rcnn.py(203): _sample_rois /home/hans/Detectron/detectron/roi_data/fast_rcnn.py(112): add_fast_rcnn_blobs /home/hans/Detectron/detectron/ops/collect_and_distribute_fpn_rpn_proposals.py(62): forward

Error from operator: input: "gpu_0/rpn_rois_fpn2" input: "gpu_0/rpn_rois_fpn3" input: "gpu_0/rpn_rois_fpn4" input: "gpu_0/rpn_rois_fpn5" input: "gpu_0/rpn_rois_fpn6" input: "gpu_0/rpn_roi_probs_fpn2" input: "gpu_0/rpn_roi_probs_fpn3" input: "gpu_0/rpn_roi_probs_fpn4" input: "gpu_0/rpn_roi_probs_fpn5" input: "gpu_0/rpn_roi_probs_fpn6" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois" output: "gpu_0/labels_int32" output: "gpu_0/bbox_targets" output: "gpu_0/bbox_inside_weights" output: "gpu_0/bbox_outside_weights" output: "gpu_0/keypoint_rois" output: "gpu_0/keypoint_locations_int32" output: "gpu_0/keypoint_weights" output: "gpu_0/keypoint_loss_normalizer" output: "gpu_0/rois_fpn2" output: "gpu_0/rois_fpn3" output: "gpu_0/rois_fpn4" output: "gpu_0/rois_fpn5" output: "gpu_0/rois_idx_restore_int32" output: "gpu_0/keypoint_rois_fpn2" output: "gpu_0/keypoint_rois_fpn3" output: "gpu_0/keypoint_rois_fpn4" output: "gpu_0/keypoint_rois_fpn5" output: "gpu_0/keypoint_rois_idx_restore_int32" name: "CollectAndDistributeFpnRpnProposalsOp:gpu_0/rpn_rois_fpn2,gpu_0/rpn_rois_fpn3,gpu_0/rpn_rois_fpn4,gpu_0/rpn_rois_fpn5,gpu_0/rpn_rois_fpn6,gpu_0/rpn_roi_probs_fpn2,gpu_0/rpn_roi_probs_fpn3,gpu_0/rpn_roi_probs_fpn4,gpu_0/rpn_roi_probs_fpn5,gpu_0/rpn_roi_probs_fpn6,gpu_0/roidb,gpu_0/im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:5" } arg { name: "grad_output_indices" } device_option { device_type: 0 }Error from operator: input: "gpu_0/rpn_rois_fpn2" input: "gpu_0/rpn_rois_fpn3" input: "gpu_0/rpn_rois_fpn4" input: "gpu_0/rpn_rois_fpn5" input: "gpu_0/rpn_rois_fpn6" input: "gpu_0/rpn_roi_probs_fpn2" input: "gpu_0/rpn_roi_probs_fpn3" input: "gpu_0/rpn_roi_probs_fpn4" input: "gpu_0/rpn_roi_probs_fpn5" input: "gpu_0/rpn_roi_probs_fpn6" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois" output: "gpu_0/labels_int32" output: "gpu_0/bbox_targets" output: "gpu_0/bbox_inside_weights" output: "gpu_0/bbox_outside_weights" output: "gpu_0/keypoint_rois" output: "gpu_0/keypoint_locations_int32" output: "gpu_0/keypoint_weights" output: "gpu_0/keypoint_loss_normalizer" output: "gpu_0/rois_fpn2" output: "gpu_0/rois_fpn3" output: "gpu_0/rois_fpn4" output: "gpu_0/rois_fpn5" output: "gpu_0/rois_idx_restore_int32" output: "gpu_0/keypoint_rois_fpn2" output: "gpu_0/keypoint_rois_fpn3" output: "gpu_0/keypoint_rois_fpn4" output: "gpu_0/keypoint_rois_fpn5" output: "gpu_0/keypoint_rois_idx_restore_int32" name: "CollectAndDistributeFpnRpnProposalsOp:gpu_0/rpn_rois_fpn2,gpu_0/rpn_rois_fpn3,gpu_0/rpn_rois_fpn4,gpu_0/rpn_rois_fpn5,gpu_0/rpn_rois_fpn6,gpu_0/rpn_roi_probs_fpn2,gpu_0/rpn_roi_probs_fpn3,gpu_0/rpn_roi_probs_fpn4,gpu_0/rpn_roi_probs_fpn5,gpu_0/rpn_roi_probs_fpn6,gpu_0/roidb,gpu_0/im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:5" } arg { name: "grad_output_indices" } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const) + 0x76 (0x7f802d955916 in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: + 0xb8cc0 (0x7f8052678cc0 in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #2: + 0xb6146 (0x7f8052676146 in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #3: + 0x10fc3f (0x7f80526cfc3f in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #4: + 0x10e2dd (0x7f80526ce2dd in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x185 (0x7f80301b7935 in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libtorch.so) frame #6: + 0x25ed11a (0x7f803015c11a in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libtorch.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f802d93b27b in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #8: + 0xc8421 (0x7f8057469421 in /home/hans/anaconda2/envs/myenv/bin/../lib/libstdc++.so.6) frame #9: + 0x76ba (0x7f805e3846ba in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x6d (0x7f805d9aa41d in /lib/x86_64-linux-gnu/libc.so.6) , op Python [E net_async_base.cc:134] Rethrowing exception from the run of 'generalized_rcnn' WARNING workspace.py: 222: Original python traceback for operator 389 in network generalized_rcnn in exception above (most recent call last): WARNING workspace.py: 227: File "tools/train_net.py", line 132, in WARNING workspace.py: 227: File "tools/train_net.py", line 114, in main WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/utils/train.py", line 53, in train_model WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/utils/train.py", line 145, in create_model WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/modeling/model_builder.py", line 124, in create WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/modeling/model_builder.py", line 89, in generalized_rcnn WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/modeling/model_builder.py", line 229, in build_generic_detection_model WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/modeling/optimizer.py", line 40, in build_data_parallel_model WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/modeling/optimizer.py", line 63, in _build_forward_graph WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/modeling/model_builder.py", line 189, in _single_gpu_build_func WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/modeling/rpn_heads.py", line 46, in add_generic_rpn_outputs WARNING workspace.py: 227: File "/home/hans/Detectron/detectron/modeling/FPN.py", line 449, in add_fpn_rpn_losses Traceback (most recent call last): File "tools/train_net.py", line 132, in main() File "tools/train_net.py", line 114, in main checkpoints = detectron.utils.train.train_model() File "/home/hans/Detectron/detectron/utils/train.py", line 67, in train_model workspace.RunNet(model.net.Proto().name) File "/home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/workspace.py", line 254, in RunNet StringifyNetName(name), num_iter, allow_fail, File "/home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/workspace.py", line 215, in CallWithExceptionIntercept return func(args, **kwargs) RuntimeError: [enforce fail at pybind_state.h:425] . Exception encountered running PythonOp function: AssertionError:

At: /home/hans/Detectron/detectron/utils/keypoints.py(169): keypoints_to_heatmap_labels /home/hans/Detectron/detectron/roi_data/keypoint_rcnn.py(76): add_keypoint_rcnn_blobs /home/hans/Detectron/detectron/roi_data/fast_rcnn.py(203): _sample_rois /home/hans/Detectron/detectron/roi_data/fast_rcnn.py(112): add_fast_rcnn_blobs /home/hans/Detectron/detectron/ops/collect_and_distribute_fpn_rpn_proposals.py(62): forward

Error from operator: input: "gpu_0/rpn_rois_fpn2" input: "gpu_0/rpn_rois_fpn3" input: "gpu_0/rpn_rois_fpn4" input: "gpu_0/rpn_rois_fpn5" input: "gpu_0/rpn_rois_fpn6" input: "gpu_0/rpn_roi_probs_fpn2" input: "gpu_0/rpn_roi_probs_fpn3" input: "gpu_0/rpn_roi_probs_fpn4" input: "gpu_0/rpn_roi_probs_fpn5" input: "gpu_0/rpn_roi_probs_fpn6" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois" output: "gpu_0/labels_int32" output: "gpu_0/bbox_targets" output: "gpu_0/bbox_inside_weights" output: "gpu_0/bbox_outside_weights" output: "gpu_0/keypoint_rois" output: "gpu_0/keypoint_locations_int32" output: "gpu_0/keypoint_weights" output: "gpu_0/keypoint_loss_normalizer" output: "gpu_0/rois_fpn2" output: "gpu_0/rois_fpn3" output: "gpu_0/rois_fpn4" output: "gpu_0/rois_fpn5" output: "gpu_0/rois_idx_restore_int32" output: "gpu_0/keypoint_rois_fpn2" output: "gpu_0/keypoint_rois_fpn3" output: "gpu_0/keypoint_rois_fpn4" output: "gpu_0/keypoint_rois_fpn5" output: "gpu_0/keypoint_rois_idx_restore_int32" name: "CollectAndDistributeFpnRpnProposalsOp:gpu_0/rpn_rois_fpn2,gpu_0/rpn_rois_fpn3,gpu_0/rpn_rois_fpn4,gpu_0/rpn_rois_fpn5,gpu_0/rpn_rois_fpn6,gpu_0/rpn_roi_probs_fpn2,gpu_0/rpn_roi_probs_fpn3,gpu_0/rpn_roi_probs_fpn4,gpu_0/rpn_roi_probs_fpn5,gpu_0/rpn_roi_probs_fpn6,gpu_0/roidb,gpu_0/im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:5" } arg { name: "grad_output_indices" } device_option { device_type: 0 }Error from operator: input: "gpu_0/rpn_rois_fpn2" input: "gpu_0/rpn_rois_fpn3" input: "gpu_0/rpn_rois_fpn4" input: "gpu_0/rpn_rois_fpn5" input: "gpu_0/rpn_rois_fpn6" input: "gpu_0/rpn_roi_probs_fpn2" input: "gpu_0/rpn_roi_probs_fpn3" input: "gpu_0/rpn_roi_probs_fpn4" input: "gpu_0/rpn_roi_probs_fpn5" input: "gpu_0/rpn_roi_probs_fpn6" input: "gpu_0/roidb" input: "gpu_0/im_info" output: "gpu_0/rois" output: "gpu_0/labels_int32" output: "gpu_0/bbox_targets" output: "gpu_0/bbox_inside_weights" output: "gpu_0/bbox_outside_weights" output: "gpu_0/keypoint_rois" output: "gpu_0/keypoint_locations_int32" output: "gpu_0/keypoint_weights" output: "gpu_0/keypoint_loss_normalizer" output: "gpu_0/rois_fpn2" output: "gpu_0/rois_fpn3" output: "gpu_0/rois_fpn4" output: "gpu_0/rois_fpn5" output: "gpu_0/rois_idx_restore_int32" output: "gpu_0/keypoint_rois_fpn2" output: "gpu_0/keypoint_rois_fpn3" output: "gpu_0/keypoint_rois_fpn4" output: "gpu_0/keypoint_rois_fpn5" output: "gpu_0/keypoint_rois_idx_restore_int32" name: "CollectAndDistributeFpnRpnProposalsOp:gpu_0/rpn_rois_fpn2,gpu_0/rpn_rois_fpn3,gpu_0/rpn_rois_fpn4,gpu_0/rpn_rois_fpn5,gpu_0/rpn_rois_fpn6,gpu_0/rpn_roi_probs_fpn2,gpu_0/rpn_roi_probs_fpn3,gpu_0/rpn_roi_probs_fpn4,gpu_0/rpn_roi_probs_fpn5,gpu_0/rpn_roi_probs_fpn6,gpu_0/roidb,gpu_0/im_info" type: "Python" arg { name: "grad_input_indices" } arg { name: "token" s: "forward:5" } arg { name: "grad_output_indices" } device_option { device_type: 1 device_id: 0 }frame #0: c10::ThrowEnforceNotMet(char const, int, char const, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, void const*) + 0x76 (0x7f802d955916 in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: + 0xb8cc0 (0x7f8052678cc0 in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #2: + 0xb6146 (0x7f8052676146 in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #3: + 0x10fc3f (0x7f80526cfc3f in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #4: + 0x10e2dd (0x7f80526ce2dd in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/caffe2_pybind11_state_gpu.so) frame #5: caffe2::AsyncNetBase::run(int, int) + 0x185 (0x7f80301b7935 in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libtorch.so) frame #6: + 0x25ed11a (0x7f803015c11a in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libtorch.so) frame #7: c10::ThreadPool::main_loop(unsigned long) + 0x2db (0x7f802d93b27b in /home/hans/anaconda2/envs/myenv/lib/python2.7/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #8: + 0xc8421 (0x7f8057469421 in /home/hans/anaconda2/envs/myenv/bin/../lib/libstdc++.so.6) frame #9: + 0x76ba (0x7f805e3846ba in /lib/x86_64-linux-gnu/libpthread.so.0) frame #10: clone + 0x6d (0x7f805d9aa41d in /lib/x86_64-linux-gnu/libc.so.6)