fastmachinelearning / hls4ml

Machine learning on FPGAs using HLS
https://fastmachinelearning.org/hls4ml
Apache License 2.0
1.22k stars 396 forks source link

Is hls4ml capable of converting a yolov3, yolov4 model? #198

Closed kutenai closed 2 years ago

kutenai commented 4 years ago

I've looked through the examples, and mostly it looks like 2 and 3 layer models. Has anyone tried to convert a yolov3 or yolov4 model? Is that too large of a task for the converter?

That is ultimately my goal, and I was hoping that hls4ml would help get there. I'd love to hear some thoughts on that from someone more knowledgable on the capabilities, or someone that has explored this possibility.

pierinim commented 4 years ago

hls4ml works with small convolutional layers, dense layers, and recurrent layers, mainly on xilinx FPGA for the moment. Most of its functionalities are still to be merged in the main branch.

We have a roadmap in place to be able to get to convert big computing-vision models. For now, we have an implementation of Resnet50 scattered across FPGAs and a few R&D ideas to support big and more complex architectures. On your hand, you should work on a compressed version of the model, possibility to binary or ternary precision (eg using QKeras). We are doing that with a custom (different input) version of the SSD model, for instance. Then one needs to assess the amount of required resources, compared that to what the FPGA allows. This is fully theoretical and doesn’t require Hls4ml. Once that is done, one should see how to make it work with hls4ml. For our SSD project, for instance, this will require Extra 6 months, maybe one year.

jmduarte commented 4 years ago

Hi @kutenai,

We are looking into converting/running bigger convolutional models, like different versions of ResNet(-18/-50) @violatingcp @vloncar.

From my understanding of YOLOv3 and YOLOv4, one of the main things that would need to be implemented is the backbone Darknet-53 (for YOLOv3):

Screen Shot 2020-06-14 at 12 29 09 PM

or CSPDarknet-53 for YOLOv4.

In addition, the treatment of the bounding boxes would be a novel feature to consider implementing.

I see there has been some work by others on implementing YOLO-like architectures on an FPGA:

which might be good starting points. We would certainly welcome further discussion or contributions from those interested in implementing some of these needed features.

kutenai commented 4 years ago

Thanks for the updates @pierinim, and the references @jmduarte There is a lot to be done. Hopefully, there is money in our engineering budget to pursue this. We spent a lot of time going down the OpenCL path, and then found it won't work for the application we are targeting, which is an embedded core with FPGA. I'm looking at HLS now, but might run out of time/budget first.