Open yoppy-tjhin opened 1 month ago
Hello,
We have already successfully run YOLO models on our NPU, including v4, v5, v8, and Tiny YOLO v2.
To run it on the NPU, you need to convert it to NBG format, you will find more information in these wiki:
Then the generated model will be able to run on the NPU. However, you need to develop your own application for your specific use case based on our demo application as an example. If your use case is object detection, you will probably need to change the post-processing file.
Regards, ABR
Hello,
Thank you for your prompt response. Do you have an example working python script for running YOLOv5 on the NPU of STM32MP257?
Thank you.
Hello,
We have the Semantic Segmentation demo application that is based on YoloV8-pose model.
Currently, we do not have an example of Yolo model for Object Detection. But, you can look at the Semantic Segmentation post-processing file to reproduce the NMS function in a Object Detection use case. You can also use resources from Ultralytics github to reproduce the YoloV5 post-processing in your application.
Regards, ABR
Hi,
I did try to run the ST yolov8n OD model on the STM32MP2. And I see that the model has input (UINT8) and output (FLOAT).
For a quick test, we want to get a .tflite model from the ultralytics yolov8n.pt model zoo. We convert the .pt model to .tflite using the Ultralytics conversion tool with option==INT8. The resulted model can be run on the STM32MP2, with inference speed 0.5 fps. This is too slow. Despite the conversion option==INT8, the resulted .tflite input (FLOAT) and output (FLOAT) datataype. This may explain why the inference speed is only 0.5 fps. My question is how to get a converted model with input (UINT8) and output (FLOAT), like the ST .tflite model?
I did checked that in the ST model zoo, there is a tool for quantization and setting the input/output datatype. But this only accept .h5 model. We don’t have .h5 model.
Hello,
I think that the problem is not related to the input/output type but to the neural network model format. To understand how to deploy a model on our hardware, please follow this wiki article: https://wiki.st.com/stm32mpu/wiki/How_to_deploy_your_NN_model_on_STM32MPU
To be able to run a neural network model with hardware acceleration on STM32MP2, you need to convert your TFLite or ONNX model to the Network Binary Graph (.nb) format. To convert it, please follow this wiki article: https://wiki.st.com/stm32mpu/wiki/ST_Edge_AI:_Guide_for_MPU
Once the .nb file is generated, you will be able to benchmark both the .nb model and the .tflite model with the x-linux-ai-benchmark tool to check if the model is running on the CPU, GPU, or NPU and at what framerate. Please check this wiki article to benchmark your model: https://wiki.st.com/stm32mpu/wiki/How_to_benchmark_your_NN_model_on_STM32MPU
Finally, you will be able to use the .nb model in your application.
Regards, ABR
Hi,
Previously we have been successfully running our custom YOLOv5 model on the NPU of Rockchip 3588 platform. Now we want to porting our platform to STM32MP2. From the wiki guide, object detection with YOLOv5 or newer version is not mentioned.
We wish that our custom YOLOv5 model can be run the NPU of STM32MP2. If now is not supported yet, are there any plan to move to that direction?
We have trained the ssd_mobilenet_v2_fpn model using our same dataset as for YOLOv5. (yet, still trying to run the model #49) And the mAP (as stated in the training logs) of ssd_mobilenet_v2_fpn is much lower than YOLOv5.
Thank you.