do you have the quantization tool?

HimaxWiseEyePlus / Seeed_Grove_Vision_AI_Module_V2

https://www.himax.com.tw/products/wiseeye-ai-sensing/wiseeye2-ai-processor/

MIT License

47 stars 23 forks source link

do you have the quantization tool? #50

Closed ardeal closed 4 days ago

ardeal commented 2 weeks ago

Hi,

are you running float32 neural network in the chip? do you chip support int8 NPU? do you have the quantization tool to quantize model and then run it in the chip?

kris-himax commented 2 weeks ago

Hi @ardeal , No, we run the int8 neural network model on NPU. Yes. You can use TensorFlow to quantize your own model to int8 and use vela compiler to let the int8 model can run on U55 NPU.

TensorFlow quantization document: https://www.tensorflow.org/lite/performance/post_training_quantization
How to compile model by vela: https://github.com/HimaxWiseEyePlus/ML_FVP_EVALUATION?tab=readme-ov-file#how-to-use-himax-config-file-to-generate-vela-model

ardeal commented 2 weeks ago

@kris-himax @WSA1k, Thank you! a few more questions： where can I download the datasheet of the chip? what is the RAM size in the chip? RAM denotes the internal RAM(internal SRAM or DRAM, rather than SPI RAM)?

kris-himax commented 2 weeks ago

@ardeal

https://files.seeedstudio.com/wiki/grove-vision-ai-v2/HX6538_datasheet.pdf
2MB about internal SRAM.

ardeal commented 2 weeks ago

@kris-himax Thank you for your reply! I saw the demo board(WiseEye AI module V2) based on the chip. it can run face detection/recognition and etc. I got a few more questions: 1) what is the input image size of face detection neural network? 2) how many parameters of the face detection neural network? 3) what is the frame rate of face detection neural network in the chip? 4) when it is running face detection, what is the current of the board or the chip? 5) what interface should be used if I want to send out the image from the chip? 6) what is the maximum parameter count of NN supported in the chip? 7) do you have more detailed materials about the board to be shared?

kris-himax commented 2 weeks ago

@ardeal You can reference this application tflm_fd_fm

160x160x1
the model you can download at here
You can reference here to calculate the frame rate.
You can reference here, and here
You can reference this application tflm_fd_fm which is send by UART
Depend on your scenario_app code, and there are 2MB interneal SRAM at WE2 chip. You can test your NN model on FVP first. here is the tutorial.
You can read the github readme, FAQ or Seeed website

ardeal commented 2 weeks ago

@kris-himax Thank you for your reply!

if we run int8 inference in the chip, the hardware needs to save inter-mediate result in RAM. the question is: should I reserve that RAM for hardware, or the hardware has dedicated RAM itself?

in above post, you told me the link: TensorFlow quantization document: https://www.tensorflow.org/lite/performance/post_training_quantization I didn't use TF to do PTQ. The question is: If I input calibrating/training image to PTQ, will the accuracy decrease after PTQ quantization?

kris-himax commented 2 weeks ago

@ardeal

if we run int8 inference in the chip, the hardware needs to save inter-mediate result in RAM. the question is: should I reserve that RAM for hardware, or the hardware has dedicated RAM itself?

If you run the int8 NN tflite model inference, it will be run on a space called tensor arena which you should reserve it by your self. You can reference here.

While you run the vela compiler, the vela report will tell you how much about the SRAM size. You can reference the picture here and find the key word Total SRAM used.

in above post, you told me the link: TensorFlow quantization document: https://www.tensorflow.org/lite/performance/post_training_quantization I didn't use TF to do PTQ. The question is: If I input calibrating/training image to PTQ, will the accuracy decrease after PTQ quantization?

Yes, it will be decrease. Or you can use QAT which can fine-tune after quantize. You can reference QAT doc and the example about MOT(Model Optimization) at here