apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

retinaface model to onnx #15892

Open Zheweiqiu opened 5 years ago

Zheweiqiu commented 5 years ago

Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form.

For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io

Description

got error exporting retinaface model to onnx. But it worked when I tried insightface model.

Environment info (Required)

----------Python Info---------- Version : 3.7.3 Compiler : GCC 7.3.0 Build : ('default', 'Mar 27 2019 22:11:17') Arch : ('64bit', '') ------------Pip Info----------- Version : 19.1.1 Directory : /home/qiuzhewei/anaconda3/lib/python3.7/site-packages/pip ----------MXNet Info----------- Version : 1.5.0 Directory : /home/qiuzhewei/anaconda3/lib/python3.7/site-packages/mxnet Commit Hash : 75a9e187d00a8b7ebc71412a02ed0e3ae489d91f Library : ['/home/qiuzhewei/anaconda3/lib/python3.7/site-packages/mxnet/libmxnet.so'] Build features: ✖ CUDA ✖ CUDNN ✖ NCCL ✖ CUDA_RTC ✖ TENSORRT ✔ CPU_SSE ✔ CPU_SSE2 ✔ CPU_SSE3 ✔ CPU_SSE4_1 ✔ CPU_SSE4_2 ✖ CPU_SSE4A ✔ CPU_AVX ✖ CPU_AVX2 ✖ OPENMP ✖ SSE ✔ F16C ✖ JEMALLOC ✖ BLAS_OPEN ✖ BLAS_ATLAS ✖ BLAS_MKL ✖ BLAS_APPLE ✔ LAPACK ✖ MKLDNN ✔ OPENCV ✖ CAFFE ✖ PROFILER ✔ DIST_KVSTORE ✖ CXX14 ✖ INT64_TENSOR_SIZE ✔ SIGNAL_HANDLER ✖ DEBUG ----------System Info---------- Platform : Linux-4.15.0-55-generic-x86_64-with-debian-stretch-sid system : Linux node : chaowei-SYS-7048GR-TR release : 4.15.0-55-generic version : #60~16.04.2-Ubuntu SMP Thu Jul 4 09:03:09 UTC 2019 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU 运行模式: 32-bit, 64-bit Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 每个核的线程数:1 每个座的核数: 6 Socket(s): 2 NUMA 节点: 2 厂商 ID: GenuineIntel CPU 系列: 6 型号: 79 Model name: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz 步进: 1 CPU MHz: 1316.511 CPU max MHz: 1700.0000 CPU min MHz: 1200.0000 BogoMIPS: 3403.17 虚拟化: VT-x L1d 缓存: 32K L1i 缓存: 32K L2 缓存: 256K L3 缓存: 15360K NUMA node0 CPU(s): 0-5 NUMA node1 CPU(s): 6-11 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm arat pln pts md_clear flush_l1d ----------Network Test---------- Setting timeout: 10 Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0016 sec, LOAD: 1.2854 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.3885 sec, LOAD: 1.5392 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0077 sec, LOAD: 2.0701 sec. Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 9.3760 sec, LOAD: 0.8177 sec. Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0007 sec, LOAD: 0.9090 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.1989 sec, LOAD: 0.4561 sec.

Package used (Python/R/Scala/Julia): I'm using Python 3.7

For Scala user, please provide:

  1. Java version: (java -version)
  2. Maven version: (mvn -version)
  3. Scala runtime if applicable: (scala -version)

For R user, please provide R sessionInfo():

Build info (Required if built from source)

Compiler (gcc/clang/mingw/visual studio):

MXNet commit hash: fatal: Not a git repository (or any of the parent directories): .git

Build config: (Paste the content of config.mk, or the build command.)

Error Message:

[19:34:40] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.3.0. Attempting to upgrade... Traceback (most recent call last): File "mxnet2onnx.py", line 10, in converted_model_path = onnx_mxnet.export_model(sym, params, [input_shape], np.float32, onnx_file) File "/home/qiuzhewei/anaconda3/lib/python3.7/site-packages/mxnet/contrib/onnx/mx2onnx/export_model.py", line 80, in export_model sym_obj, params_obj = load_module(sym, params) File "/home/qiuzhewei/anaconda3/lib/python3.7/site-packages/mxnet/contrib/onnx/mx2onnx/_export_helper.py", line 58, in load_module sym, arg_params, aux_params = mx.model.load_checkpoint(model_name, num_epochs) File "/home/qiuzhewei/anaconda3/lib/python3.7/site-packages/mxnet/model.py", line 450, in load_checkpoint symbol = sym.load('%s-symbol.json' % prefix) File "/home/qiuzhewei/anaconda3/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 2728, in load check_call(_LIB.MXSymbolCreateFromFile(c_str(fname), ctypes.byref(handle))) File "/home/qiuzhewei/anaconda3/lib/python3.7/site-packages/mxnet/base.py", line 253, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: Cannot find argument 'mode', Possible Arguments:

axis : int, optional, default='-1' The axis along which to compute softmax. temperature : double or None, optional, default=None Temperature parameter in softmax dtype : {None, 'float16', 'float32', 'float64'},optional, default='None' DType of the output in case this can't be inferred. Defaults to the same as input's dtype if not defined (dtype=None). , in operator softmax(name="face_rpn_cls_prob_stride32", mode="channel")

Minimum reproducible example

(If you are using your own code, please provide a short script that reproduces the error. Otherwise, please provide link to the existing example.) I am using the code from http://mxnet.incubator.apache.org/versions/master/tutorials/onnx/export_mxnet_to_onnx.html

Steps to reproduce

(Paste the commands you ran that produced the error.)

1.Just run "python mxnet2onnx.py" where mxnet2onnx.py comes from the website above 2.

What have you tried to solve it?

  1. Tried to use another mxnet model(Resnet 50) to see if it works
mxnet-label-bot commented 5 years ago

Hey, this is the MXNet Label Bot. Thank you for submitting the issue! I will try and suggest some labels so that the appropriate MXNet community members can help resolve it. Here are my recommended labels: ONNX, Bug

vandanavk commented 5 years ago

I don't see this issue using MXNet master and ONNX v1.3.0. Which ONNX version are you using? Could you share the symbol and params file of your model?

Zheweiqiu commented 5 years ago

I don't see this issue using MXNet master and ONNX v1.3.0. Which ONNX version are you using? Could you share the symbol and params file of your model? onnx version: 1.21. mxnet version: 1.5.0 The model is downloaded from https://github.com/deepinsight/insightface/tree/master/RetinaFace under "RetinaFace Pretrained Models" section. There is a dropbox link with which you can directly download the model. Thanks!

vandanavk commented 5 years ago

Thanks @Zheweiqiu. I faced this error: AttributeError: No conversion function registered for op type SoftmaxActivation yet. with the model you shared on latest MXNet and ONNX v1.3.0. There is no export support for this operator yet. Please open a feature request

Zheweiqiu commented 5 years ago

Thanks @vandanavk for trying out. I go this error at first. After I googled, according to this answer I replace all "SoftmaxActivation" with "softmax" in the *.json file. Then I got the error stated in the question. Do I need to re-train the model using op "softmax" instead of "SoftmaxActivation"? Thanks!

vandanavk commented 5 years ago

SoftmaxActivation has an attribute mode. Replacing SoftmaxActivation with softmax may not solve the issue. Equivalent attributes for softmax may have to be mentioned. To start with, along with replacing SoftmaxActivation with softmax in the json, try to remove the attribute mode - softmax operator will use default values for attributes in this case. ONNX export of softmax operator is available

vandanavk commented 5 years ago

Thanks @vandanavk for trying out. I go this error at first. After I googled, according to this answer I replace all "SoftmaxActivation" with "softmax" in the *.json file. Then I got the error stated in the question. Do I need to re-train the model using op "softmax" instead of "SoftmaxActivation"? Thanks!

Retraining with softmax would be the best solution, since SoftmaxActivation has been deprecated.

Zheweiqiu commented 5 years ago

Thanks @vandanavk for trying out. I go this error at first. After I googled, according to this answer I replace all "SoftmaxActivation" with "softmax" in the *.json file. Then I got the error stated in the question. Do I need to re-train the model using op "softmax" instead of "SoftmaxActivation"? Thanks!

Retraining with softmax would be the best solution, since SoftmaxActivation has been deprecated.

I tried remove the attribute mode for softmax operator but got the following error: AttributeError: No conversion function registered for op type UpSampling yet. I believe same problem will be encountered even if I retrain the model and I see this issue is still working in progress. Thanks for your reply!

yumaofan commented 5 years ago

I meet the same problem. wish to solve asap, thanks for a lot.

vandanavk commented 5 years ago

Thanks @vandanavk for trying out. I go this error at first. After I googled, according to this answer I replace all "SoftmaxActivation" with "softmax" in the *.json file. Then I got the error stated in the question. Do I need to re-train the model using op "softmax" instead of "SoftmaxActivation"? Thanks!

Retraining with softmax would be the best solution, since SoftmaxActivation has been deprecated.

I tried remove the attribute mode for softmax operator but got the following error: AttributeError: No conversion function registered for op type UpSampling yet. I believe same problem will be encountered even if I retrain the model and I see this issue is still working in progress. Thanks for your reply!

Support for Upsampling operator is currently in review. Operator changes in https://github.com/apache/incubator-mxnet/pull/15811 and ONNX support in https://github.com/apache/incubator-mxnet/pull/15994. You could pull in these changes and build locally to try immediately. Else, you could watch out for these 2 PRs getting merged.

Zheweiqiu commented 5 years ago

Thanks @vandanavk for trying out. I go this error at first. After I googled, according to this answer I replace all "SoftmaxActivation" with "softmax" in the *.json file. Then I got the error stated in the question. Do I need to re-train the model using op "softmax" instead of "SoftmaxActivation"? Thanks!

Retraining with softmax would be the best solution, since SoftmaxActivation has been deprecated.

I tried remove the attribute mode for softmax operator but got the following error: AttributeError: No conversion function registered for op type UpSampling yet. I believe same problem will be encountered even if I retrain the model and I see this issue is still working in progress. Thanks for your reply!

Support for Upsampling operator is currently in review. Operator changes in #15811 and ONNX support in #15994. You could pull in these changes and build locally to try immediately. Else, you could watch out for these 2 PRs getting merged.

The mxnet was installed using pip command. Do I need to uninstall it and rebuild it from source to reflect those changes?

luan1412167 commented 5 years ago

@vandanavk I have changed as your comment https://github.com/apache/incubator-mxnet/issues/15892#issuecomment-524952771. However I got error File "mxnet_to_onnx_converter.py", line 36, in <module> converted_model_path = onnx_mxnet.export_model(sym, params, [input_shape], np.float32, onnx_file) File "/home/luandd/miniconda3/envs/luandao/lib/python3.7/site-packages/mxnet/contrib/onnx/mx2onnx/export_model.py", line 83, in export_model verbose=verbose) File "/home/luandd/miniconda3/envs/luandao/lib/python3.7/site-packages/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 211, in create_onnx_graph_proto graph_outputs = MXNetGraph.get_outputs(sym, params, in_shape, output_label) File "/home/luandd/miniconda3/envs/luandao/lib/python3.7/site-packages/mxnet/contrib/onnx/mx2onnx/export_onnx.py", line 142, in get_outputs _, out_shapes, _ = sym.infer_shape(**inputs) File "/home/luandd/miniconda3/envs/luandao/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1076, in infer_shape res = self._infer_shape_impl(False, *args, **kwargs) File "/home/luandd/miniconda3/envs/luandao/lib/python3.7/site-packages/mxnet/symbol/symbol.py", line 1210, in _infer_shape_impl ctypes.byref(complete))) File "/home/luandd/miniconda3/envs/luandao/lib/python3.7/site-packages/mxnet/base.py", line 253, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: Error in operator face_rpn_cls_prob_stride32: [12:34:58] src/operator/softmax_output.cc:86: Check failed: in_shape->size() == 2U (1 vs. 2) : Input:[data, label]

Zheweiqiu commented 5 years ago

@luan1412167 Did you replace SoftmaxActivation with softmax and remove all attribute mode in your .json file?

luan1412167 commented 5 years ago

@Zheweiqiu Screenshot from 2019-10-15 09-52-46 I have converted successful but I don't know get result from what output? .I get result boundingbox is so small. [[[[5.3017639e-04 5.4040336e-04 5.1597238e-04 ... 4.6603641e-04 5.1257212e-04 4.5140341e-04] [5.4795144e-04 5.1839353e-04 4.8552474e-04 ... 4.8594383e-04 5.6435599e-04 5.0048833e-04] [5.1728956e-04 5.0016592e-04 4.6242378e-04 ... 5.1690591e-04 5.4673775e-04 5.2698085e-04]

So can you point out result help me?

yumaofan commented 5 years ago

@luan1412167 You should read RetinaFace paper carefully. hah

Zheweiqiu commented 5 years ago

@luan1412167 Your output seems to be feature vector or some intermediate output rather than bounding box. The name of output layer is "output" in mxnet and I dont think its gonna change during model conversion.

luan1412167 commented 5 years ago

@Zheweiqiu, @AaronFan1992

`sym = '/home/luandd/Downloads/R50-symbol.json' params = '/home/luandd/Downloads/R50-0000.params' input_shape = (1,3,1920,1080) onnx_file = '/home/luandd/CLionProjects/untitled/retinaface.onnx'

converted_model_path = onnx_mxnet.export_model(sym, params, [input_shape], np.float32, onnx_file, verbose=False)` My code converter is above. What is wrong? Sorry because I'm first time do with it.

Zheweiqiu commented 5 years ago

@luan1412167 I am using the same script as yours to do the conversion but I am busy with other stuff and dont get time to do the rightness verification of the converted model.

luan1412167 commented 5 years ago

@Zheweiqiu can you share me your model?

Zheweiqiu commented 5 years ago

@luan1412167 I am afraid of not. The company is pretty strict about this and I am not allowed to upload anything to cloud nor send any file via email. Sorry about that.

luan1412167 commented 5 years ago

@Zheweiqiu thanks you. Can you tell me the architecture output is right?

luan1412167 commented 5 years ago

@Zheweiqiu where did you download retinaface mxnet model?

Chenyangzh commented 4 years ago

Hello everyone

I am in a situation of mxnet==1.5.0 and onnx==1.6.0 and want to convert the retinaface mobile-net model to ONNX model.

I aslo met the issues mentioned above, including SoftmaxActivation op and UpSampling op. Thanks for above suggestions, I solved these two problems but met a new one which is in Crop op.

The Crop node has no 'out_shape' key word. I found a similar issue in #14881, but It's not easy to get specific h-w size in that crop node. Because mxnet Crop op allows to use the name of previous output as the input of present node.

There are also some discussions in #9885 but no conclusion.

Anyone have some suggestions? Thanks a lot.

Chenyangzh commented 4 years ago

Finally, I solve the issue depending on https://github.com/cholihao/Retinaface-caffe/issues/4

yangshuailc commented 4 years ago

@Chenyangzh , I aslo met the SoftmaxActivation op and UpSampling op issues,Can you share your solution? thanks you

mohamadHN93 commented 3 years ago

@Zheweiqiu can you share me your model?

Hi can you help me in converting this model into onnx with having a dynamic batch size and input shape? https://drive.google.com/file/d/13_fmDpkD7IyZAP5HWbwAdmAsonFQ34vm/view?usp=sharing