Closed ardeal closed 4 days ago
Hi @ardeal , No, we run the int8 neural network model on NPU. Yes. You can use TensorFlow to quantize your own model to int8 and use vela compiler to let the int8 model can run on U55 NPU.
@kris-himax @WSA1k, Thank you! a few more questions: where can I download the datasheet of the chip? what is the RAM size in the chip? RAM denotes the internal RAM(internal SRAM or DRAM, rather than SPI RAM)?
@ardeal
@kris-himax Thank you for your reply! I saw the demo board(WiseEye AI module V2) based on the chip. it can run face detection/recognition and etc. I got a few more questions: 1) what is the input image size of face detection neural network? 2) how many parameters of the face detection neural network? 3) what is the frame rate of face detection neural network in the chip? 4) when it is running face detection, what is the current of the board or the chip? 5) what interface should be used if I want to send out the image from the chip? 6) what is the maximum parameter count of NN supported in the chip? 7) do you have more detailed materials about the board to be shared?
@ardeal You can reference this application tflm_fd_fm
@kris-himax Thank you for your reply!
if we run int8 inference in the chip, the hardware needs to save inter-mediate result in RAM. the question is: should I reserve that RAM for hardware, or the hardware has dedicated RAM itself?
in above post, you told me the link: TensorFlow quantization document: https://www.tensorflow.org/lite/performance/post_training_quantization I didn't use TF to do PTQ. The question is: If I input calibrating/training image to PTQ, will the accuracy decrease after PTQ quantization?
@ardeal
if we run int8 inference in the chip, the hardware needs to save inter-mediate result in RAM. the question is: should I reserve that RAM for hardware, or the hardware has dedicated RAM itself?
If you run the int8 NN tflite model inference, it will be run on a space called tensor arena
which you should reserve it by your self. You can reference here.
While you run the vela compiler, the vela report will tell you how much about the SRAM size. You can reference the picture here and find the key word Total SRAM used
.
in above post, you told me the link: TensorFlow quantization document: https://www.tensorflow.org/lite/performance/post_training_quantization I didn't use TF to do PTQ. The question is: If I input calibrating/training image to PTQ, will the accuracy decrease after PTQ quantization?
Yes, it will be decrease. Or you can use QAT which can fine-tune after quantize. You can reference QAT doc and the example about MOT(Model Optimization) at here
Hi,
are you running float32 neural network in the chip? do you chip support int8 NPU? do you have the quantization tool to quantize model and then run it in the chip?