infer_shape error for 'resnet-152'

KineticCookie commented 6 years ago

Description

Hello. I ran to a problem with a model described in https://mxnet.incubator.apache.org/tutorials/python/predict_image.html I tried to infer shapes for model inputs but encountered an error.

Environment info (Required)

----------Python Info----------
Version      : 3.6.4
Compiler     : GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)
Build        : ('default', 'Dec 21 2017 15:39:08')
Arch         : ('64bit', '')
------------Pip Info-----------
Version      : 9.0.1
Directory    : /Users/bulat/anaconda/envs/mxnet/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version      : 1.0.0
Directory    : /Users/bulat/anaconda/envs/mxnet/lib/python3.6/site-packages/mxnet
Commit Hash   : fe80b1c812237ca228fdff4fe48f3b13eb69bc3e
----------System Info----------
Platform     : Darwin-16.7.0-x86_64-i386-64bit
system       : Darwin
node         : mbpro.local
release      : 16.7.0
version      : Darwin Kernel Version 16.7.0: Mon Nov 13 21:56:25 PST 2017; root:xnu-3789.72.11~1/RELEASE_X86_64
----------Hardware Info----------
machine      : x86_64
processor    : i386
b'machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI'
b'machdep.cpu.leaf7_features: SMEP ERMS RDWRFSGS TSC_THREAD_OFFSET BMI1 HLE AVX2 BMI2 INVPCID RTM SMAP RDSEED ADX IPT SGX FPU_CSDS MPX CLFSOPT'
b'machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX SMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C'
b'machdep.cpu.brand_string: Intel(R) Core(TM) i5-6360U CPU @ 2.00GHz'
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0875 sec, LOAD: 1.1388 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1118 sec, LOAD: 0.1954 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.5085 sec, LOAD: 0.9550 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.1705 sec, LOAD: 1.2387 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0323 sec, LOAD: 0.5678 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0450 sec, LOAD: 0.1672 sec.

Package used (Python/R/Scala/Julia): I'm using Python 3.6 package.

Error Message:

infer_shape error. Arguments:

MXNetError Traceback (most recent call last)

in () 5 mx.test_utils.download(path+'synset.txt')] 6 sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0) ----> 7 sym.infer_shape() ~/anaconda/envs/mxnet/lib/python3.6/site-packages/mxnet/symbol/symbol.py in infer_shape(self, *args, **kwargs) 963 """ 964 try: --> 965 res = self._infer_shape_impl(False, *args, **kwargs) 966 if res[1] is None: 967 arg_shapes, _, _ = self._infer_shape_impl(True, *args, **kwargs) ~/anaconda/envs/mxnet/lib/python3.6/site-packages/mxnet/symbol/symbol.py in _infer_shape_impl(self, partial, *args, **kwargs) 1093 ctypes.byref(aux_shape_ndim), 1094 ctypes.byref(aux_shape_data), -> 1095 ctypes.byref(complete))) 1096 if complete.value != 0: 1097 arg_shapes = [ ~/anaconda/envs/mxnet/lib/python3.6/site-packages/mxnet/base.py in check_call(ret) 144 """ 145 if ret != 0: --> 146 raise MXNetError(py_str(_LIB.MXGetLastError())) 147 148 MXNetError: Error in operator bn_data: [15:17:50] src/operator/batch_norm-inl.h:239: Check failed: channelAxis < dshape.ndim() (1 vs. 0) Channel axis out of range: 1 Stack trace returned 7 entries: [bt] (0) 0 libmxnet.so 0x00000001055c1598 _ZN4dmlc15LogMessageFatalD2Ev + 40 [bt] (1) 1 libmxnet.so 0x00000001066a3b00 _ZNK5mxnet2op13BatchNormProp10InferShapeEPNSt3__16vectorIN4nnvm6TShapeENS2_9allocatorIS5_EEEES9_S9_ + 1968 [bt] (2) 2 libmxnet.so 0x000000010668dbee _ZN5mxnet2op16OpPropInferShapeERKN4nnvm9NodeAttrsEPNSt3__16vectorINS1_6TShapeENS5_9allocatorIS7_EEEESB_ + 878 [bt] (3) 3 libmxnet.so 0x0000000106530cca _ZZN5mxnet4exec9InferAttrIN4nnvm6TShapeENSt3__18functionIFbRKNS2_9NodeAttrsEPNS4_6vectorIS3_NS4_9allocatorIS3_EEEESD_EEEZNS0_10InferShapeEONS2_5GraphEOSC_RKNS4_12basic_stringIcNS4_11char_traitsIcEENSA_IcEEEEE3$_0DnEESG_SH_T_PKcST_ST_ST_ST_T1_T2_bST_NS_12DispatchModeEENKUljbE_clEjb + 1978 [bt] (4) 4 libmxnet.so 0x000000010652960e _ZN5mxnet4exec10InferShapeEON4nnvm5GraphEONSt3__16vectorINS1_6TShapeENS4_9allocatorIS6_EEEERKNS4_12basic_stringIcNS4_11char_traitsIcEENS7_IcEEEE + 4542 [bt] (5) 5 libmxnet.so 0x00000001064d70a9 MXSymbolInferShape + 2281 [bt] (6) 6 libffi.6.dylib 0x00000001036e6884 ffi_call_unix64 + 76 ## Minimum reproducible example ```python import mxnet as mx path='http://data.mxnet.io/models/imagenet-11k/' [mx.test_utils.download(path+'resnet-152/resnet-152-symbol.json'), mx.test_utils.download(path+'resnet-152/resnet-152-0000.params'), mx.test_utils.download(path+'synset.txt')] sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0) sym.infer_shape() ``` ## Steps to reproduce 1. pip install mxnet 2. Run the code specified above ## What have you tried to solve it? 1. Install mxnet v0.11 2. Create new venv and run it there

zhreshold commented 6 years ago

You need to provide the data shape if you are trying to run infer_shape. Docs: https://mxnet.incubator.apache.org/api/python/symbol/symbol.html?highlight=infer_shape#mxnet.symbol.Symbol.infer_shape

KineticCookie commented 6 years ago

@zhreshold thanks for pointing it out. But the thing is, what if I don't know what data shape is for current model? In this tutorial shape is basically hardcoded: mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes) I try to abstract over that and can't find a proper solution.

KineticCookie commented 6 years ago

The broader question: Is there a way to get information (names, data types, shapes) about model inputs and outputs using only model persistence mechanism? (like Tensorflow model signatures)

zhreshold commented 6 years ago

You can't get any output shape without input shape, that's true for every single library. If you would like have attributes of layers, gluon do have it.

from mxnet import gluon
net = gluon.model_zoo.vision.get_model('resnet152_v2')
print(net)

KineticCookie commented 6 years ago

I executed your snippet, and I can't find information about shapes model was trained with.

ResNetV2(
  (features): HybridSequential( /* layer attributes */)
  (output): Dense(2048 -> 1000, linear)
)

Layer attribute doesn't contain information to get a model input and output shape. Moreover I can't find a data types for each layer.

I made an additional search and found this issue #7641 with @jeremiedb explaining shapes implementation:

There's typically no need to specify shape of data input when building the symbolic network. This will typically will be set at training time when the model is bind and the shapes infered from what the iterator provides as input data. This allows the same network to be trained with different batch sizes.

Seems like mxnet infers information about shapes and data types at training, but doesn't store it in model files. infer_[shape, type] methods are the only way to get this info, but they require hardcoded variables.

I try to implement Tensorflow Serving-like server that could handle any exported mxnet model, and serve it via HTTP api. But to do this I need to know:

What data model is waiting for? (name of input, shape of input, data type of input) So server can prepare user data to be passed to inference method.
What model will return after inference? (name of output, shape of output, data type of ouput) So clients of my server expect some specified values to return after they send a request.

If I put away my serving case: I got a mxnet model from datascientist to use in my app. Model ships with no documentation. I can't contact datascientist either. Is there any way to use this mxnet model, considering I have no clue what data was used to train this model?

zhreshold commented 6 years ago

You might want to checkout mxnet model serving @kevinthesun

kevinthesun commented 6 years ago

@KineticCookie If you don't know exactly what a model does and what kind of inputs it accepts, it might not be a good idea to directly use it in your app. For model inputs, there can be many variables you need to know to use the model for inference. For example, what types of inputs does this model expect, image or text or combination? What is the input data shapes? Is it a fixed shape or can be a variable?

A possible way to manage your own model zoo for serving is to have some signature files to record all the necessary information to use this model. However, you need to know the input information in the first place.

KineticCookie commented 6 years ago

@kevinthesun

If you don't know exactly what a model does and what kind of inputs it accepts, it might not be a good idea to directly use it in your app.

I understand your concerns, but the sole purpose of my app is to provide a simple way to expose models as web service. If I train model with dataset of int32 and 300x300 shape, I think that is obvious that I intend to use it with the same shape and datatype. The point is, why mxnet doesn't provide information about datatypes and shapes, which are already inferred and known in the training process, when I export model?

For instance, in TensorFlow:

I create Signature for a model with Tensor information (as described in https://www.tensorflow.org/serving/serving_basic).
I specify which tensors are inputs and which ones are outputs. Shapes and types, defined statically or inferred, are written to the signature.
Signature information is exported with model itself.

When I import a model, I know about model inputs/outputs, with their types and shapes at a runtime. This helps me to validate data and pass it to the model. This also gives me an opportunity to create a documentation or an interface, so external users are able to use this model.

In conclusion, mxnet doesn't have a Signature mechanism and it infers shape and data at the training. So, why doesn't it save this information along with model?

kevinthesun commented 6 years ago

Take a look at https://github.com/awslabs/mxnet-model-server. To serve and manage models, you may need many information, such as input data shape and input data type(It can be both numerical type or MIME type). You may also want to record the model version and dataset used for training. I think here what you want is something similar to model signature file in mxnet model server. The design is to decouple all these model related information from mxnet model saving system. With save_checkpoint function in mxnet, you just save model graph definition and parameters. Then in serving framework, you provide all the information you need and save them as signature file. In this way you can store and manage them for serving. If all these information are added into mxnet save_checkpoint, it may be too heavyweight for mxnet standard model file.

Currently this framework is not in mxnet yet. But you can use the similar way of managing your model for web service, it can be just a signature json file. In the future, mxnet will add serving framework, similar to tensorflow serving. At that time, similar function to create signature file will be introduced in that serving framework.

KineticCookie commented 6 years ago

@kevinthesun thanks for the detailed explanation. 👍

apache / mxnet