The expected gain from using TI providers for ONNX

yurkovak commented 1 year ago

custom_model_evaluation.md shows that there is a support of both Default RT Session and also of RT Session with TIDL acceleration. However, the latter is a lot of extra work and it's not clear what is the expected benefit of it, I wasn't able to find any benchmark table that would show the benefit of using TI providers. Does such a table exist?

If not, in the ballpark, how much speedup should I expect from using the providers compared to a plain quantized ONNX? E.g.

from onnxruntime.quantization import quantize_dynamic, QuantType, quantize_static
quantize_dynamic(onnx_path, target_quant_onnx_path, weight_type=QuantType.QUInt8)

yurkovak commented 1 year ago

I compiled the provided edgeai-yolov5 and I get the same runtime from:

inference with TIDLExecutionProvider after compilation to int8;
inference with TIDLExecutionProvider after compilation to fp16;
inference with only CPUExecutionProvider and without without using artifacts.

The same goes for a few models from model_zoo with pre-compiled models. Is it expected?

yurkovak commented 1 year ago

The issue was with the order of providers, TIExecutionProvider has to be the first on the list. Closing

TexasInstruments / edgeai-tidl-tools

The expected gain from using TI providers for ONNX #55