apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

Problem when doing batch inference with fpn_resnest101. #18321

Open alexsisu opened 4 years ago

alexsisu commented 4 years ago

Description

The problem is that im trying to sent a batch of images ( > 1) to the model. But it fails with the following errr

Error Message

  input_sym_arg_type = in_param.infer_type()[0]
Traceback (most recent call last):
  File "/Users/alexsisu/work_phd/different_issues/issues_mxnet_fpnresnet/issue1.py", line 13, in <module>
    res= net(xx)
  File "/Users/alexsisu/anaconda3/envs/nga/lib/python3.7/site-packages/mxnet/gluon/block.py", line 693, in __call__
    out = self.forward(*args)
  File "/Users/alexsisu/anaconda3/envs/nga/lib/python3.7/site-packages/mxnet/gluon/block.py", line 1158, in forward
    return self.hybrid_forward(ndarray, x, *args, **params)
  File "/Users/alexsisu/anaconda3/envs/nga/lib/python3.7/site-packages/gluoncv/model_zoo/faster_rcnn/faster_rcnn.py", line 374, in hybrid_forward
    rpn_roi = F.concat(*[roi_batchid.reshape((-1, 1)), rpn_box.reshape((-1, 4))], dim=-1)
  File "<string>", line 70, in concat
  File "/Users/alexsisu/anaconda3/envs/nga/lib/python3.7/site-packages/mxnet/_ctypes/ndarray.py", line 107, in _imperative_invoke
    ctypes.byref(out_stypes)))
  File "/Users/alexsisu/anaconda3/envs/nga/lib/python3.7/site-packages/mxnet/base.py", line 255, in check_call
    raise MXNetError(py_str(_LIB.MXGetLastError()))
mxnet.base.MXNetError: [18:51:20] src/operator/nn/concat.cc:67: Check failed: shape_assign(&(*in_shape)[i], dshape): Incompatible input shape: expected [1000,-1], got [5000,4]
Stack trace:
  [bt] (0) 1   libmxnet.so                         0x0000000115653bd9 libmxnet.so + 31705
  [bt] (1) 2   libmxnet.so                         0x0000000115a3c18d mxnet::op::NDArrayOpParam::__DECLARE__(dmlc::parameter::ParamManagerSingleton<mxnet::op::NDArrayOpParam>*) + 375581
  [bt] (2) 3   libmxnet.so                         0x00000001173c68ff mxnet::imperative::SetShapeType(mxnet::Context const&, nnvm::NodeAttrs const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, mxnet::DispatchMode*) + 1439
  [bt] (3) 4   libmxnet.so                         0x00000001173c54f0 mxnet::Imperative::Invoke(mxnet::Context const&, nnvm::NodeAttrs const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> > const&) + 688
  [bt] (4) 5   libmxnet.so                         0x00000001172e145c SetNDInputsOutputs(nnvm::Op const*, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> >*, std::__1::vector<mxnet::NDArray*, std::__1::allocator<mxnet::NDArray*> >*, int, void* const*, int*, int, int, void***) + 1564
  [bt] (5) 6   libmxnet.so                         0x00000001172e24d3 MXImperativeInvokeEx + 99
  [bt] (6) 7   libffi.6.dylib                      0x00000001072ad884 ffi_call_unix64 + 76

To Reproduce

Here's the link to the code and images: https://github.com/alexsisu/different_issues/blob/master/issues_mxnet_fpnresnet/issue1.py

Steps to reproduce

Get the above code and run it.

import mxnet as mx
from gluoncv import model_zoo, data

net = model_zoo.get_model('faster_rcnn_fpn_resnet101_v1d_coco', pretrained=True)

all_images = [f"frame_{p}.jpg" for p in range(1,6)]

x, orig_img = data.transforms.presets.rcnn.load_test(all_images)

xx = mx.nd.stack(*[p[0] for p in  x])
res= net(xx)

What have you tried to solve it?

  1. Yes

Environment

----------Python Info----------
Version      : 3.7.7
Compiler     : Clang 4.0.1 (tags/RELEASE_401/final)
Build        : ('default', 'Mar 26 2020 10:32:53')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 20.0.2
Directory    : /Users/alexsisu/anaconda3/envs/nga/lib/python3.7/site-packages/pip
----------MXNet Info-----------
Version      : 1.6.0
Directory    : /Users/alexsisu/anaconda3/envs/nga/lib/python3.7/site-packages/mxnet
Num GPUs     : 0
Commit Hash   : 6eec9da55c5096079355d1f1a5fa58dcf35d6752
----------System Info----------
Platform     : Darwin-18.7.0-x86_64-i386-64bit
system       : Darwin
node         : SISUs-MBP-2
release      : 18.7.0
version      : Darwin Kernel Version 18.7.0: Tue Aug 20 16:57:14 PDT 2019; root:xnu-4903.271.2~2/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.brand_string: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 SMEP BMI2 ERMS INVPCID FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT MDCLEAR TSXFA IBRS STIBP L1DF SSBD'
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0647 sec, LOAD: 1.0880 sec.
Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0008 sec, LOAD: 1.0197 sec.
Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.6112 sec, LOAD: 0.8483 sec.
Timing for D2L: http://d2l.ai, DNS: 0.0752 sec, LOAD: 0.5881 sec.
Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0681 sec, LOAD: 0.5668 sec.
Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1319 sec, LOAD: 0.9025 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.1272 sec, LOAD: 1.0486 sec.
Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.03305816650390625 sec.
alexsisu commented 4 years ago

any updates on this one?

leezu commented 4 years ago

Are you able to provide a minimal reproducible example using only gluoncv?

leezu commented 4 years ago

cc @zhreshold

zhreshold commented 4 years ago

@alexsisu I think the issue is irrelevant to mxnet, please take a look at the example in gluoncv: https://github.com/dmlc/gluon-cv/tree/master/scripts/segmentation and fill for issue if you think it still doesn't resolve your problem.

alexsisu commented 4 years ago

@zhreshold @leezu Im not trying to make use of a segmentation model (fcn) but of a object detection model (faster_rcnn_fpn_resnet101_v1d_coco). I provided an example here:

https://github.com/alexsisu/different_issues/blob/master/issues_mxnet_fpnresnet/issue1.py

Also, I dont understnad how this is NOT relevant to mxnet since the stack trace points exactly to an mxnet library. I allready filled an issue on the gluon project: https://github.com/dmlc/gluon-cv/issues/1319