apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.8k forks source link

ValueError: The argument structure of HybridBlock does not match the cached version. Stored format = [0], input format = [0, 0, 0] #19635

Closed edsn60 closed 3 years ago

edsn60 commented 3 years ago

Description

I was trying to train maskrcnn using mxnet and gluoncv under cpu with the script "train_mask_rcnn.py" provided by gluoncv (see https://cv.gluon.ai/build/examples_instance/train_mask_rcnn_coco.html). The train script does not raise any error, however, when I tried to load my pretrained model and test an image, I got this error. I don't know what's happening.

Error Message

Traceback (most recent call last): File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 88, in ids, scores, bboxes, masks = [xx[0].asnumpy() for xx in net(x)] File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 683, in call out = self.forward(args) File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 1430, in forward return self._call_cached_op(x, args) File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 1022, in _call_cached_op raise ValueError("The argument structure of HybridBlock does not match" ValueError: The argument structure of HybridBlock does not match the cached version. Stored format = [0], input format = [0, 0, 0]

To Reproduce

This is "pre_mask_rcnn.py": from matplotlib import pyplot as plt from gluoncv import model_zoo, data, utils from mxnet import gluon import mxnet as mx

net = gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json', ['data0', 'data1', 'data2'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu()) # this is where the error happens

in train_mask_rcnn.py, I used the following two lines to save the model and parameters. net.saveparameters('{:s}{:04d}_{:.4f}.params'.format(prefix, epoch, currentmap)) net.export('{:s}{:04d}_{:.4f}'.format(prefix, epoch, current_map), epoch=0)

Steps to reproduce

Run the "pre_mask_rcnn.py" in pycharm

What have you tried to solve it?

Load from model_zoo

I tried to use another way to load model from model_zoo with my pretrained parameters:

param = "./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params" net = model_zoo.get_model('mask_rcnn_resnet50_v1b_coco', pretrained=False) net.initialize(ctx=mx.cpu()) net.reset_class(['insulator']) net.load_parameters(param.strip())

but I got another error in "net.load_parameters(param.strip())" Traceback (most recent call last): File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 35, in net.load_parameters(param.strip()) File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 530, in load_parameters self.collect_params().load( File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py", line 1022, in load self.load_dict(ndarray_load, ctx, allow_missing, File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py", line 1055, in load_dict assert name in arg_dict, \ AssertionError: Parameter 'conv0_weight' is missing in file: ./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params, which contains parameters: 'maskrcnn0_resnetv1b_conv0_weight', 'maskrcnn0_resnetv1b_batchnorm0_gamma', 'maskrcnn0_resnetv1b_batchnorm0_beta', ..., 'maskrcnn0_maskrcnn0_mask0_conv0_weight', 'maskrcnn0_maskrcnn0_mask0_conv0_bias', 'maskrcnn0_maskrcnn0_mask0_conv1_weight', 'maskrcnn0_maskrcnn0_mask0_conv1_bias'. Please make sure source and target networks have the same prefix.For more info on naming, please see https://mxnet.io/api/python/docs/tutorials/packages/gluon/blocks/naming.html

It seems that the name of parameters in the loaded model do not have prefix ""maskrcnn0resnetv1b", which appears in my saved parameters. I went back to "train_mask_rcnn.py" and found an argument "save_prefix", but it affects the file name of ".param" and ".json" instead of the parameters themselves.

By the way, in "pre_mask_rcnn.py", if I change net = gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json', ['data0', 'data1', 'data2'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu())

into net = gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json', ['data0'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu())

then I got another error: Traceback (most recent call last): File "/home/shelvin_yuan/Desktop/gluoncv_test/pre_mask_rcnn.py", line 37, in net = gluon.SymbolBlock.imports('./mask_rcnn_resnet50_v1b_coco_0000_0.0000-symbol.json', ['data0'], './mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params', ctx=mx.cpu()) File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/block.py", line 1366, in imports ret.collect_params().load(param_file, ctx=ctx, cast_dtype=True, dtype_source='saved') File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py", line 1022, in load self.load_dict(ndarray_load, ctx, allow_missing, File "/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/gluon/parameter.py", line 1055, in load_dict assert name in arg_dict, \ AssertionError: Parameter 'data1' is missing in file: ./mask_rcnn_resnet50_v1b_coco_0000_0.0000-0000.params, which contains parameters: 'resnetv1b_conv0_weight', 'resnetv1b_batchnorm0_gamma', 'resnetv1b_batchnorm0_beta', ..., 'maskrcnn0_mask0_conv0_weight', 'maskrcnn0_mask0_conv0_bias', 'maskrcnn0_mask0_conv1_weight', 'maskrcnn0_mask0_conv1_bias'. Please make sure source and target networks have the same prefix.For more info on naming, please see https://mxnet.io/api/python/docs/tutorials/packages/gluon/blocks/naming.html

Environment

----------Python Info---------- Version : 3.8.5 Compiler : GCC 7.3.0 Build : ('default', 'Sep 4 2020 07:30:14') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 20.2.4 Directory : /home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/pip ----------MXNet Info----------- Version : 1.7.0 Directory : /home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet Commit Hash : 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a 64f737cdd59fe88d2c5b479f25d011c5156b6a8a Library : ['/home/shelvin_yuan/anaconda3/envs/gluoncv/lib/python3.8/site-packages/mxnet/libmxnet.so'] Build features: ✖ CUDA ✖ CUDNN ✖ NCCL ✖ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✔ CPU_SSE4_1 ✔ CPU_SSE4_2 ✖ CPU_SSE4A ✔ CPU_AVX ✖ CPU_AVX2 ✔ OPENMP ✖ SSE ✔ F16C ✖ JEMALLOC ✔ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✔ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✔ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ✖ TVM_OP ----------System Info---------- Platform : Linux-5.4.0-56-generic-x86_64-with-glibc2.10 system : Linux node : s-ubuntu release : 5.4.0-56-generic version : #62~18.04.1-Ubuntu SMP Tue Nov 24 10:07:50 UTC 2020 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 1 Core(s) per socket: 8 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 158 Model name: Intel(R) Core(TM) i7-9700 CPU @ 3.00GHz Stepping: 13 CPU MHz: 906.093 CPU max MHz: 4700.0000 CPU min MHz: 800.0000 BogoMIPS: 6000.00 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 12288K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d arch_capabilities ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0181 sec, LOAD: 16.4462 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.8030 sec, LOAD: 3.1562 sec. Error open Gluon Tutorial(cn): https://zh.gluon.ai, <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1123)>, DNS finished in 0.39185047149658203 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.8720 sec, LOAD: 2.1558 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0062 sec, LOAD: 6.1062 sec. Error open Conda: https://repo.continuum.io/pkgs/free/, HTTP Error 403: Forbidden, DNS finished in 0.5848102569580078 sec. ----------Environment----------

github-actions[bot] commented 3 years ago

Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue. Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly. If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on contributing to MXNet and our development guides wiki.

szha commented 3 years ago

To use with imports interface, you need to use block.export to save both the graph and the parameters.