Closed jayalberts closed 4 months ago
Giving a shot to Beatmup: https://github.com/lnstadrum/beatmup GPU-acceleration for image processing, it has its way to run inference: https://github.com/lnstadrum/beatmup/blob/master/python/examples/cifar100.py
GPU inference is slow. Beatmap is way to do inference, benchmarking today. Requires support from Nick.
Running out of space on system libraries - consulted with Chris and can have more room. Lots of space and dependencies for tensor flow. Refactored beatmap to make room, figured out serialization. will try to do on dashcam today to see how fast it is - benchmarking. If numbers are promising then we have to figure out how to train models.
Blocked: waiting for nick input. Not able to convert models yet.
Beatmap: unblocked today. Found dif config for dif libs. Tricky to still set up. Will have a sense for next steps later today.
Beatmup is also a hand-written converter from TF Keras to GPU-compatible models, but turned out there's no easy way to convert our existing model to Keras.
KeyError: 'split'
Any existing solution will require lots of additional work to add support for variety of operations to convert to Keras, and then to Beatmup.
Decided to give a shot to ncnn to overcome this complexity: https://qengineering.eu/install-ncnn-on-raspberry-pi-4.html
The main pro of NCNN here is that @nhyman will be capable to directly convert to NCNN format without any additional effort to rewrite. / enhance existing converters. I'm working on installing and benchmarking ncnn execution overt GPU
Combination of NCNN and Vulkan API for GPU turned out to be slow for inference.
https://qengineering.eu/install-vulkan-on-raspberry-pi.html https://github.com/Tencent/ncnn/issues/2435
ArmNN with combination of XNN is my next attempt to install and benchmark
The hope is that tooling is optimised for Arm64 architecture and can give us a necessary speed boost https://github.com/ARM-software/armnn https://github.com/google/XNNPACK
As an interesting finding, those are existing solutions like QMKL6, specifically written for Videocore 6 (Pi4 gpu) to be used as BLAS library and basically function as a computational backend for other C++ libraries, but I'm struggling to find any usage of it or any examples of building inference libs replacing the computational backend https://github.com/Idein/qmkl6
Nick wasn't able to convert models either, Beatmap not an option? Exploring other options. Focused today on understanding rest of options.
Caffe by Berkeley AI and TFLite for Rpi 64 feel like a last resort to give a shot: https://caffe.berkeleyvision.org/ https://github.com/Qengineering/TensorFlow_Lite_SSD_RPi_64-bits?tab=readme-ov-file hope to get good numbers on CPU only
Tensor flow light solution seems to be faster, still working on CPU while camera is busy. Combining images into grid and do ML over all at same time. Have quasi working solution ready for testing.
Faster you move, the more in the grid.
GPU could maybe be leveraged better for camera bridge.
OpenCL is not directly supported in RPi 4 (Only custom CLVK modules via Vulkan API), that is very complicated to install on the offline device