apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.67k stars 3.45k forks source link

[ONNX][Relay][QNN] tvm doesn't support mix-precision inputs for qnn matmul #13466

Open vvchernov opened 1 year ago

vvchernov commented 1 year ago

Expected behavior

No check for types matching is expected for qnn matmul inputs due to there is no constraint for QLinearMatMul operation in ONNX documentation

Actual behavior

Quantized model from Hugging face failed during compilation due to matching of input tensor types.

Environment

Linux 20.04 LTE

Steps to reproduce

Usual step for compilation and launching of the onnx-model by VirtualMachine by python front-end

Triage

Notes

There is similar issue and discuss for qnn conv2d. Unfortunately there is no any solution or minimal discussion. It looks like the problem is more general: tvm relays matches input types for qnn operations, but it does not assume by ONNX op description.

cc @KJlaccHoeUM9l @ehsanmok

vvchernov commented 1 year ago

cc @masahi