If user load quantized model, user can set float type input buffer and data, output buffer.
Then runtime can quantize input to read input data and dequantize output data to write output buffer.
Why?
If user load on-device quantized model, user cannot know exact quantized input and output buffer's parameter.
So we need to support type-aware input/output buffer setting
What?
If user load quantized model, user can set float type input buffer and data, output buffer. Then runtime can quantize input to read input data and dequantize output data to write output buffer.
Why?
If user load on-device quantized model, user cannot know exact quantized input and output buffer's parameter. So we need to support type-aware input/output buffer setting