matmul uint8 converts to flexbatchmatmul

PINTO0309 / onnx2tf

Self-Created Tools to convert ONNX files (NCHW) to TensorFlow/TFLite/Keras format (NHWC). The purpose of this tool is to solve the massive Transpose extrapolation problem in onnx-tensorflow (onnx-tf). I don't need a Star, but give me a pull request.

MIT License

708 stars 73 forks source link

matmul uint8 converts to flexbatchmatmul #648

Closed tensorbuffer closed 5 months ago

tensorbuffer commented 5 months ago

Issue Type

Others

OS

Linux

onnx2tf version number

1.22.4

onnx version number

1.15.0

onnxruntime version number

1.17.1

onnxsim (onnx_simplifier) version number

0.4.33

tensorflow version number

2.16.1

Download URL for ONNX

matmul.zip

Parameter Replacement JSON

No need, since I used your docker image.

Description

Purpose: trying to get a tflite model with a single fullyconnected layer that's converted from onnx matmul.
What: I tried with docker image (docker.io/pinto0309/onnx2tf:1.22.4), and added parameter -rtpo MatMulInteger, but still got a tflite model with flexbatchmatmul.
How: this is how I run the command within docker: user@b47d54f3bb8e:/workdir$ sudo onnx2tf -i matmul_model.uint8.onnx -o output.uint8 -rtpo MatMulInteger (the model is uploaded as zip file)
Why: because I need such a model to feed to our hw accelator
Resource: none

PINTO0309 commented 5 months ago

because I need such a model to feed to our hw accelator

onnx2tf (TensorFlow Lite) can only generate Float32 MatMul / BatchMatMul, is that ok?

To begin with, the UINT8 input tensor is not in the ONNX specification. Are you doing any weird processing? The fact that the model is not generated with MatMulInteger is strange to begin with.

https://github.com/onnx/onnx/blob/main/docs/Changelog.md#MatMul-13

Cannot be inferred by onnxruntime.

[ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(uint8)' of input parameter (onnx____MatMul_0) of operator (MatMul) in node (/MatMul) is invalid.

matmul_model.uint8_float32.tflite.zip

It would be easy for onnx2tf to support, but I wouldn't want to implement anything other than the standard spec.

tensorbuffer commented 5 months ago

Thanks for the quick response! My end goal is to have a matmul with uint8 (or int8) to test our hw accelerator. I first tried with int8 and I got this error when I do onnx2tf: Value passed to parameter 'x' has DataType int8 not in list of allowed values: bfloat16, float16, float32, float64, int16, int32, int64, uint8, uint16, uint32, uint64, complex64, complex128 So I thought uint8 is a valid data type. I can use float and generate the fullyconnected layer fine, but our hw doesn't support float yet. I probably can try float with quantization? then the result won't match exactly since floats have different representation between hw and host...

PINTO0309 commented 5 months ago

Use Float32 onnx. And you can quantize to int8/uint8 with just two options -oiqt, -qt uint8 or -qt int8.

onnx2tf was initially designed with a strong emphasis on optimizing for hardware accelerators dedicated to quantization models, such as EdgeTPU and Hailo-8.

Instead of trying to quantize int8 or uint8 on the onnx side, it should be quantized directly in onnx2tf or TensorFlow.

You have complicated the story.

github-actions[bot] commented 5 months ago

If there is no activity within the next two days, this issue will be closed automatically.