AxisCommunications / acap-native-sdk-examples

Example code for APIs and features in AXIS Camera Application Platform (ACAP) Native SDK
Apache License 2.0
41 stars 23 forks source link

object-detection example - quantization issue #183

Closed Equidem closed 7 months ago

Equidem commented 8 months ago

While creating an object detection app based on the object-detection example here, I noticed that in the example, quantization of the input for the model isn't explicitly handled anywhere in the preprocessing setup, even though the model provided in the example has quantization on the input. Thus I thought that it had to be handled somewhere inside the Larod library, but from the testing I did with my own quantized model, this is apparently not the case, and I had to manually perform the quantization of the input before passing it to the model to make the model work. I have two questions regarding this:

  1. Is this a bug, or is the model provided with the example somehow special and doesn't require quantizing the input for it to work? Examining the model in Netron suggests otherwise, but maybe I am missing something.
  2. For models that require the quantization, what would be the most optimal way to perform it using the C++ SDK libraries (VDO, Larod etc.)? I went through the available documentation, but couldn't find anything.
pataxis commented 7 months ago

Hi @Equidem , thank you for your questions, we'll get back to you when we have had time to look in to it.

Corallo commented 7 months ago

@Equidem Not sure if I understand correctly what you mean. The model in the example expects a quantized input, i.e. an image in the int8 format. The output of vdo is already in a "quantized" state, just in a yuv format instead of rgb. In the example we use a preprocessing to convert it from yuv to rgb, the rgb data you get are already in int8 format and don't need quantization before being sent in the network.

Let me know if I misunderstood your question

Equidem commented 7 months ago

Hi @Corallo, the issue here is not whether the input is in int8 (or more precisely uint8, since that's what VDO provides in the example). As you can see in the attached screenshot from netron.app, the model used in the object-detection example in this repository ( https://github.com/google-coral/test_data/raw/master/ssd_mobilenet_v2_coco_quant_postprocess.tflite ) has quantization parameters of 128 as zero point and 0.0078125 as scale. From our testing, these are not applied to the input image anywhere in the example, since we had to manually use them to quantize the input image before passing it into our model to make it work. However, this seems like something that should be part of the preprocessing job, which is why I am asking whether there is some way to do it using the Larod or VDO libraries. Screenshot from 2024-01-15 11-46-46

Corallo commented 7 months ago

Hi @Equidem

In most of the cases, including our example, you don't need to apply quantization parameters to the input obtained by vdo, as it is already in the correct scale. I can't follow what you are trying to do with those quantization parameters. Those parameters describe how you can get from the float value to the int format. If you apply them to the vdo image which is already in the uint8 format, you will go back to a value between 0 and 1, which is not what you want.

pataxis commented 7 months ago

Closed due to inactivity