Gathering information regarding GPU and how to proceed

punov commented 5 months ago

OpenCL is not directly supported in RPi 4 (Only custom CLVK modules via Vulkan API), that is very complicated to install on the offline device

punov commented 5 months ago

Giving a shot to Beatmup: https://github.com/lnstadrum/beatmup GPU-acceleration for image processing, it has its way to run inference: https://github.com/lnstadrum/beatmup/blob/master/python/examples/cifar100.py

jayalberts commented 5 months ago

GPU inference is slow. Beatmap is way to do inference, benchmarking today. Requires support from Nick.

jayalberts commented 5 months ago

Running out of space on system libraries - consulted with Chris and can have more room. Lots of space and dependencies for tensor flow. Refactored beatmap to make room, figured out serialization. will try to do on dashcam today to see how fast it is - benchmarking. If numbers are promising then we have to figure out how to train models.

jayalberts commented 5 months ago

Blocked: waiting for nick input. Not able to convert models yet.

Beatmap: unblocked today. Found dif config for dif libs. Tricky to still set up. Will have a sense for next steps later today.

punov commented 5 months ago

Beatmup is also a hand-written converter from TF Keras to GPU-compatible models, but turned out there's no easy way to convert our existing model to Keras.

onnx2keras library was one of the first options to try, but lib itself seemed outdated https://github.com/gmalivenko/onnx2keras It didn't like Split operator, and once you try to fix it, it was keep having issues for me over and over again: Traceback (most recent call last): File "/Users/alekseipunov/Projects/sandbox/beatmup/python/convert.py", line 8, in
k_model = onnx_to_keras(onnx_model, ['images'], name_policy='renumerate') ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/homebrew/lib/python3.11/site-packages/onnx2keras/converter.py", line 175, in onnx_to_keras AVAILABLE_CONVERTERS[node_type]( File "/opt/homebrew/lib/python3.11/site-packages/onnx2keras/operation_layers.py", line 239, in convert_split splits = params["split"]
```
KeyError: 'split'
```
I also gave a shot to pt2keras library: https://github.com/JWLee89/pt2keras That seemed fresher, you can provide Pt or Onnx model to it, it also required me to add a manual support for Softmax, Split, Resize and Sub, but still didn't work as expected.

punov commented 5 months ago

Any existing solution will require lots of additional work to add support for variety of operations to convert to Keras, and then to Beatmup.

Decided to give a shot to ncnn to overcome this complexity: https://qengineering.eu/install-ncnn-on-raspberry-pi-4.html

The main pro of NCNN here is that @nhyman will be capable to directly convert to NCNN format without any additional effort to rewrite. / enhance existing converters. I'm working on installing and benchmarking ncnn execution overt GPU

punov commented 5 months ago

Combination of NCNN and Vulkan API for GPU turned out to be slow for inference.

https://qengineering.eu/install-vulkan-on-raspberry-pi.html https://github.com/Tencent/ncnn/issues/2435

punov commented 5 months ago

ArmNN with combination of XNN is my next attempt to install and benchmark

The hope is that tooling is optimised for Arm64 architecture and can give us a necessary speed boost https://github.com/ARM-software/armnn https://github.com/google/XNNPACK

punov commented 5 months ago

As an interesting finding, those are existing solutions like QMKL6, specifically written for Videocore 6 (Pi4 gpu) to be used as BLAS library and basically function as a computational backend for other C++ libraries, but I'm struggling to find any usage of it or any examples of building inference libs replacing the computational backend https://github.com/Idein/qmkl6

jayalberts commented 5 months ago

Nick wasn't able to convert models either, Beatmap not an option? Exploring other options. Focused today on understanding rest of options.

punov commented 5 months ago

Caffe by Berkeley AI and TFLite for Rpi 64 feel like a last resort to give a shot: https://caffe.berkeleyvision.org/ https://github.com/Qengineering/TensorFlow_Lite_SSD_RPi_64-bits?tab=readme-ov-file hope to get good numbers on CPU only

jayalberts commented 5 months ago

Tensor flow light solution seems to be faster, still working on CPU while camera is busy. Combining images into grid and do ML over all at same time. Have quasi working solution ready for testing.

Faster you move, the more in the grid.

GPU could maybe be leveraged better for camera bridge.

Hivemapper / hdc_firmware

Gathering information regarding GPU and how to proceed #192