ARM-software / armnn

Arm NN ML Software. The code here is a read-only mirror of https://review.mlplatform.org/admin/repos/ml/armnn
https://developer.arm.com/products/processors/machine-learning/arm-nn
MIT License
1.14k stars 307 forks source link

Unsupported Operation "Transpose" in armNN::OnnxParser while loading the onnx model file (in the goal to run inference) #763

Closed pandianAK closed 6 days ago

pandianAK commented 3 months ago

Hi, There are two operators which are main concern that I am raising as an issue here. "Transpose" and "ConvTranspose". Currently I am using an Onnx file to parse the input,output and inner layers. However these two operators are not present in the source code and hence I think, they are not supported. I have attached the error and the snapshot of the model here.

""" terminate called after throwing an instance of 'armnn::ParseException' what(): Unsupported operation Transpose for node 'batch_normalization_5/FusedBatchNorm__60' at function LoadGraph

src/armnnOnnxParser/OnnxParser.cpp:1076] [1] 155966 abort (core dumped) ./aa.out """ error

catcor01 commented 3 months ago

Hello,

Your issue is similar to the following one: https://github.com/ARM-software/armnn/issues/761.

Transpose and ConvTanspose support has not been added to the ONNX parser. The work to add this support is on our radar but has not been prioritized in the near future. I can make 2 suggestions to get things running on your side:

Kind Regards, Cathal.

catcor01 commented 3 months ago

Hello again,

I just wanted to suggest you can use ONNX runtime with ACL as an option to accelerate your model also. See here: https://onnxruntime.ai/docs/execution-providers/community-maintained/ACL-ExecutionProvider.html.

Kind Regards, Cathal.

pandianAK commented 2 months ago

Thankyou for confirming the issue. I have tried converting onnx to tflite. I am facing issues with the onnx_tf library as it depends on older versions of tensorFlow and tensorFlow-addons. https://stackoverflow.com/questions/53182177/how-do-you-convert-a-onnx-to-tflite Would you recommend any other way.

I believe the contribution of adding "Transpose" to OnnxParser would take some time.

pandianAK commented 2 months ago

Hi, I have tried ONNX Runtime with ArmNN using "ExecutionProviderArmNN" api. I believe it is done through bazel and has to be built from source. Would there be any precompiled binary/library available as a part of OnnxRuntime release, which has the support for armnn(so that it has api could be linked from the header-- "OrtSessionOptionsAppendExecutionProvider_ACL" or "OrtSessionOptionsAppendExecutionProvider_ArmNN").

Thanks Pandian AK

Colm-in-Arm commented 2 months ago

Hello Pandian AK,

There is no publicly available prebuilt binary for ONNX Runtime package that includes the ACL EP.

Just to be clear it is the ACL execution provider you should be trying. There is an older Arm NN execution provider but it's probably too old for your purposes.

Colm.

pandianAK commented 2 months ago

Hi Thanks for the info so far and suggesting the ExecutionProvider using ACL. I believe the options are not much configurable in that case.

And I would need an info on one of the configurations in ArmNN standalone itself. In case of the backend options, if I use "CpuAcc", what are all the available optimize options, to enhance inference time. An example : armnn::OptimizerOptions optimizerOptions; optimizerOptions.m_OptimizeForFastMath = true; // Enable FastMath optimizerOptions.m_ReduceFp32ToFp16 = true; // Enable FP16 armnn::IOptimizedNetworkPtr optimizedNetwork = armnn::Optimize(*network, {armnn::Compute::CpuAcc}, optimizerOptions);

//Then the code for loadNetwork and EnqueueWorkload

Would you be advising to create dynamic backends, for the inner layers.(any example)

Thanks

pandianAK commented 2 months ago

Hi, As per the previous comment, I was able to write "Transpose" API work by comparing it with the TFLiteParser. Will definitely add a contribution(PR) here if it is allowed, once it passes all Unit tests.

However, I don't see any improvement in time and would still need on utilizing ArmNN and run my model with very less inference time. I was able to get backend options for CpuAcc from this URL: https://arm-software.github.io/armnn/latest/runtimeoptions.html

Incase of GpuAcc, Could you please provide all the options available and how to configure to get the fastest time. Here are the some of the options I could find. Pls add more, if there is any. 1) OpenCL tuning..(with tuning file) 2) FP16 and running 2FP16 instructions in 1FP32 cycle.. (Not sure how to enable this) 3) Specify thread options. 4) Any other cache based options.

I am not sure how to specify these options in the ArmNN for the GpuAcc backend. Please specify if there is more to this option.

Thanks Pandian AK

Colm-in-Arm commented 2 months ago

Hello,

The GpuAcc tuning parameters are described here. These are also available through various command line options in ExecuteNetwork.

Colm.