Idein / qmkl6

BLAS library for VideoCore VI QPU (Raspberry Pi 4)
BSD 3-Clause "New" or "Revised" License
67 stars 7 forks source link

Details on how QMKL6 can be used for running pretrained ONNX models #12

Closed 16jiu7 closed 2 years ago

16jiu7 commented 3 years ago

Hi! I have been figuring out how to run CNN models on mobile devices, and I have watched the impressive demos that you presented on tinyML summit. I wonder what workflow shoud I follow to enable a self-defined ONNX model to run on a raspi videocore gpu? Should I start with the implementation of operations like conv, using this library?

Many thanks for all your work on making ML on raspberry pi gpu avaible for the public!

Terminus-IMRC commented 3 years ago

Thank you for the interest.

We are Idein, not Edge Inpulse who presented on the summit:

While both of us offer a platform for edge ML, one of the biggest differences between us is that we use Raspberry Pi's integrated GPU (VideoCore IV/VI QPU) instead of using CPU with TensorFlow Lite. According to this page, it takes 122 milliseconds for a single run of MobileNet v2 with Raspberry Pi 4's CPU with TensorFlow Lite, while we achieved this performance even with Raspberry Pi Zero's QPU and it takes about 60~70 milliseconds with Raspberry Pi 4's QPU.

While some of the layers can be implemented with vector or matrix operations (e.g. convolution), others cannot, and we haven't yet been making our ML codes as public (we offer the functionality as a platform named Actcast). So, currently, if you want to run your model with Raspberry Pi's GPU, you need to write dedicated QPU codes, or you need to compute them instead on CPU. If you want it to be done within Actcast, then please feel free to contact us with this form: https://idein.jp/en/contact

Terminus-IMRC commented 2 years ago

Closing due to inactivity.