AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.68k stars 7.96k forks source link

Implementation of YoloV3 on Bare metal #6070

Open JonZubieta opened 4 years ago

JonZubieta commented 4 years ago

Hi,

We intend to implement YoloV3 on an embedded system (specifically Zynq UltraScale+). As an initial step, we pretend to implement it on a bare-metal application in the Real-Time Cores onboard. Do you think it would be plausible?

We know that the performance would be significantly slower in this way than implementing it on a GPU. Nonetheless, we are trying to modify YOLOv3 for adhering to functional safety standards and the use of a GPU would hinder the process.

Additionally, have you ever tried this kind of implementation? If so, could you please give us any suggestions?

Thank you, Jon

AlexeyAB commented 4 years ago

Did you try? https://developer.xilinx.com/en/articles/ssd-customization-deployment-by-ai-sdk.html

Yes, all YOLOv3 models were implemented on FPGA, ULA, ASIC, Neurochips, ... Just for FPGA / ULA / ASIC usually is used INT8 quantization, as in article below.

Some times ago I quantized YOLOv2 (and as I remember YOLOv3): https://github.com/AlexeyAB/yolo2_light

You can try to use other frameworks for quantization YOLO: https://github.com/hunglc007/tensorflow-yolov4-tflite python convert_tflite.py --weights ./data/yolov4.weights --output ./data/yolov4-fp16.tflite --quantize_mode full_int8 --dataset ./coco_dataset/coco/val207.txt Or: https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

Or implement it by yourself: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

JonZubieta commented 4 years ago

Hi,

First of all, thanks for your support, we appreciate it.

We are working with a floating point of 32bits so we don’t have any quantization issue.

On the other hand, we pretend to implement YoloV3 on the core R5 (used for real time applications) because of the safety requirements of our project instead of a FPGA, ULA, ASIC or any other device. The problem is that we don´t know if this has been done before and if it is plausible our approach.

Do you know any cases where YoloV3 has been implemented on a CPU instead of in a FPGA?

Do you think it would be viable despite the decrease of performance?

Thanks for your support,

Jon

El vie., 26 jun. 2020 a las 15:44, Alexey (notifications@github.com) escribió:

Did you try? https://developer.xilinx.com/en/articles/ssd-customization-deployment-by-ai-sdk.html

Yes, all YOLOv3 models were implemented on FPGA, ULA, ASIC, Neurochips, ... Just for FPGA / ULA / ASIC usually is used INT8 quantization, as in article below.

Some times ago I quantized YOLOv2 (and as I remember YOLOv3): https://github.com/AlexeyAB/yolo2_light

You can try to use other frameworks for quantization YOLO: https://github.com/hunglc007/tensorflow-yolov4-tflite python convert_tflite.py --weights ./data/yolov4.weights --output ./data/yolov4-fp16.tflite --quantize_mode full_int8 --dataset ./coco_dataset/coco/val207.txt Or: https://github.com/AlexeyAB/darknet#yolo-v4-in-other-frameworks

Or implement it by yourself: http://on-demand.gputechconf.com/gtc/2017/presentation/s7310-8-bit-inference-with-tensorrt.pdf

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/AlexeyAB/darknet/issues/6070#issuecomment-650187414, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQCWMFZ5BUVPM6OSBBBTFPLRYSQ4RANCNFSM4OJH7HXQ .

AlexeyAB commented 4 years ago

Hi, Do you mean ARM Cortex-R5 MPCore processors with built-in FPGA and DSP (Zynq / Zynq UltraScale+ MPSoC / MicroBlaze)? https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841745/Baremetal+Drivers+and+Libraries https://www.xilinx.com/support/documentation/data_sheets/ds891-zynq-ultrascale-plus-overview.pdf

And do you want to run yolov3 on R5 without OS by using bare-metal?

No, I don't know such projects. I do not know whether the development for FPGA differs greatly from FPGA of R5.

Why do you want to implement yolov3 rather than yolov4 (with activation=leaky)? YOLOv4-leaky doesn't use Mish-activation and use the same layers as v3 https://github.com/AlexeyAB/darknet/wiki/YOLOv4-model-zoo