Open CIPop opened 11 months ago
Hi, if I visualize the model with netron I get the following, As you can see, the quantization section of the input indicates that the original distribution of your data is [-1, 1) (float/double). As the model has been trained with that range of values it is waiting for values between [-1, 1] so that the inferencer can quantize it. That is, convert from [-1, 1) to [0, 255] (uint8).
-1 (1/0.0078125) + 128 = 0 0.9921875 (1/0.0078125) + 128 = 255
In the Python inferencer the quantization is not done automatically. Therefore, it expects the user to do it. As you comment in https://github.com/joonb14/TFLiteDetection/issues/1, you are right, the processor lacks the preprocessing of the input tensor to quantize it. However, since instead of loading the images with the range of values with which the model has been trained, you have loaded the images in uint8 the quantization is implicit, by coincidence.
On the other hand, in wasi-nn the quantization is done internally and, therefore, expects the original range of values. In your solution you have transformed [0, 255], which is a data distribution that does not correspond to the one the model has been trained on, to [-1, 1), which is the valid one. In this way, the model will be able to perform the transformation to the correct range of values by itself.
Note that wasi-nn expects values in the range with which it has been trained. Any other assumption is (in most cases) wrong, since the distribution of the data when making the inference should be the same as the training (there are always exceptions).
Note also that if we wanted the users themselves to quantize the values we have 2 options:
@tonibofarull, I just skimmed this issue but perhaps the information that you're looking to pass on could be done if we added a new metadata feature to wasi-nn? I've been looking for examples where this would be useful. Would the metadata need to be attached to the tensor or the graph or the context?
The metadata is already in the model, at least in the case of TFLite. The problem reported by @CIPop is that the input range expected by wasi-nn is the one used for training, which from my point of view is correct, instead of directly asking for the quantized version. In this case, the quantization turned out to be uint8 as well as the image format, but it could have been uint16 or any other, so those images would have to be scaled no matter what.
Perhaps what we can do is allow users to decide whether to quantize manually or let the runtime assume the input is that of the training.
@tonibofarull I just verified that with the expected [-1..1] input range, WASI-NN performs as expected. Thank you for the in-depth explanation!
In WAMR's WASI-NN wasi_nn.h we could add documentation
The pre-processing should be:
uint8_t
`300x300x3, values [0..255]float
300x300x3, values [-1..1]uint8_t
), apply quantization parameters. This transforms the input back to uint8_t
300x300x3, values [0..255].The important part is that, while the tensors in steps 1 and 3 have the same shape and type, the values are clearly different.
Tested in Python:
res_im = im.resize((300, 300))
np_res_im = np.array(res_im)
# Transform from input RGB [0..255] to [-1, 1]
np_res_im = (np_res_im / 255) * 2 - 1
# From https://www.tensorflow.org/lite/performance/post_training_integer_quant#run_the_tensorflow_lite_models
# Check if the input type is quantized, then rescale input data to uint8
if input_details['dtype'] == np.uint8:
input_scale, input_zero_point = input_details["quantization"]
np_res_im = np_res_im / input_scale + input_zero_point
np_res_im = np.expand_dims(np_res_im, axis=0).astype(input_details["dtype"])
# Quantized input [0..255].
print(np_res_im)
Tested in WASI-NN / C:
for (int i = 0; i < input.elements; ++i)
{
// WASM-NN expects non-quantized RGB data (-1..1)
input.input_tensor[i] = ((float)data[i] / 255) * 2 - 1;
}
Given @tonibofarull's explanation, the official TFLite quantization documentation, I am now convinced this isn't a WASI-NN / TFlite implementation bug.
This explanation is a bit ambiguous:
Lets assume the expected image is 300x300 pixels, with three channels (red, blue, and green) per pixel. This should be fed to the model as a flattened buffer of 270,000 byte values (300x300x3). If the model is quantized, each value should be a single byte representing a value between 0 and 255.
The second sentence is true only if the model is indeed quantized. I would expect that non-quantized models would accept a flattened buffer of 270000 float values.
Feel free to close unless you'd like to keep open to add the extra meta-information API that allows external quantization.
Currently, the TFLite wasi-nn implementation performs quantization if quantization scale and zero-point exist (https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/core/iwasm/libraries/wasi-nn/src/wasi_nn_tensorflowlite.cpp#L323)
This results in poor performance with
ssd_mobilenet_v1_1_metadata_1.tflite
Direct download link.The SSD mobilenet v1.1 model has the following input details:
The model works well without the RGB input (300x300x3 uint8_t) being quantized. (See my bug at https://github.com/joonb14/TFLiteDetection/issues/1 for a full Jupyter Notebook example.) When I try to apply quantization (in either python or by running the input through wasi-nn) I get very poor results.
To work-around this issue, I had to apply an inverse function when creating the input tensor:
With above workaround, I get the exact same (good) results in both Python and when running with
iwasm
(wasi-nn enabled).I'm confused by https://www.tensorflow.org/lite/performance/post_training_integer_quant#run_the_tensorflow_lite_models which states that if
input_details['dtype'] == np.uint8:
quantization should be applied to the input (what wasi-nn does)...