GPU models (for all devices if they are not all the same): TITAN Xp
PYTHONPATH environment variable: n/a
python --version output: Python 3.6.5 :: Anaconda, Inc.
Anything else that seems relevant: PyTorch version: 1.1.0
Training works. Inference with below model works too:
python tools/test_net.py
--cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml
TEST.WEIGHTS /tmp/detectron-output/train/coco_2014_train/generalized_rcnn/model_final.pkl
NUM_GPUS 1
Expected results
Successful testing on coco, as per usual.
Actual results
Traceback (most recent call last): File "tools/test_net.py", line 116, in check_expected_results=True, File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 128, in run_inference all_results = result_getter() File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 108, in result_getter multi_gpu=multi_gpu_testing File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 159, in test_net_on_dataset weights_file, dataset_name, proposal_file, output_dir, gpu_id=gpu_id File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test_engine.py", line 258, in test_net model, im, box_proposals, timers File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test.py", line 66, in im_detect_all model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals File "/vol/bitbucket2/rm2815/Detectron/detectron/core/test.py", line 158, in im_detect_bbox workspace.RunNet(model.net.Proto().name) File "/vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/workspace.py", line 237, in RunNet StringifyNetName(name), num_iter, allow_fail, File "/vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/workspace.py", line 198, in CallWithExceptionIntercept return func(*args, **kwargs) RuntimeError: IsType() ASSERT FAILED at /opt/conda/conda-bld/pytorch-nightly_1551157756140/work/aten/src/ATen/core/blob.h:77, please report a bug to PyTorch. wrong type for the Blob instance. Blob contains nullptr (uninitialized) while caller expects caffe2::Tensor. Offending Blob name: gpu_0/conv_rpn_w. Error from operator: input: "gpu_0/res4_5_sum" input: "gpu_0/conv_rpn_w" input: "gpu_0/conv_rpn_b" output: "gpu_0/conv_rpn" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "order" s: "NCHW" } arg { name: "pad" i: 1 } arg { name: "stride" i: 1 } arg { name: "exhaustive_search" i: 0 } device_option { device_type: 1 device_id: 0 } engine: "CUDNN" (Get at /opt/conda/conda-bld/pytorch-nightly_1551157756140/work/aten/src/ATen/core/blob.h:77) frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x45 (0x7f68638f59d5 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libc10.so) frame #1: caffe2::Tensor const& caffe2::Blob::Getcaffe2::Tensor() const + 0xf0 (0x7f686400f750 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-36m-x86_64-linux-gnu.so) frame #2: caffe2::Tensor const& caffe2::OperatorBase::Inputcaffe2::Tensor(int, c10::DeviceType) + 0x301 (0x7f686408bdf1 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/caffe2_pybind11_state_gpu.cpython-36m-x86_64-linux-gnu.so) frame #3: bool caffe2::CudnnConvOp::DoRunWithType<float, float, float, float>() + 0x38 (0x7f682545f428 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #4: caffe2::CudnnConvOp::RunOnDevice() + 0x198 (0x7f682544dd08 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #5: + 0x13970c5 (0x7f68253ba0c5 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2_gpu.so) frame #6: caffe2::AsyncNetBase::run(int, int) + 0x144 (0x7f684c460964 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #7: + 0x16b5549 (0x7f684c467549 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #8: c10::ThreadPool::main_loop(unsigned long) + 0x273 (0x7f684b47b773 in /vol/bitbucket/rm2815/anaconda3/lib/python3.6/site-packages/caffe2/python/../../torch/lib/libcaffe2.so) frame #9: + 0xafc5c (0x7f6869201c5c in /vol/bitbucket/rm2815/anaconda3/bin/../lib/libstdc++.so.6) frame #10: + 0x76db (0x7f6877b6d6db in /lib/x86_64-linux-gnu/libpthread.so.0) frame #11: clone + 0x3f (0x7f687789688f in /lib/x86_64-linux-gnu/libc.so.6)
Detailed steps to reproduce
In the Detectron dir:
python tools/test_net.py --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-C4_1x.yaml TEST.WEIGHTS tmp/model_final.pkl NUM_GPUS 1 OUTPUT_DIR tmp/test After successfully training via: python tools/train_net.py \ --cfg configs/12_2017_baselines/e2e_mask_rcnn_R-50-C4_1x.yaml \ OUTPUT_DIR tmp/
System information
PYTHONPATH
environment variable: n/apython --version
output: Python 3.6.5 :: Anaconda, Inc.Training works. Inference with below model works too: python tools/test_net.py --cfg configs/getting_started/tutorial_1gpu_e2e_faster_rcnn_R-50-FPN.yaml TEST.WEIGHTS /tmp/detectron-output/train/coco_2014_train/generalized_rcnn/model_final.pkl NUM_GPUS 1