Support for Amlogic S905X3 and A311D with NN accelerator

subiol commented 2 years ago

Describe what you are trying to accomplish and why in non technical terms The Amlogic S905X3 is a modification of the S905X2 with optionally added Neural Network accelerator with 1.2 TOPS. The A311D has 5 TOPS. Together with their low power consumption and the video hardware acceleration support in the linux kernel, they seem like a really good option for a Frigate system. The documentation of Amlogic says that the NN accelerator has TensorFlow and Coffee support.

Ideally, the system would also be compatible with Coral and the two NN accelerators can run together. That would allow people to buy a cheap device to test Frigate and add a Coral later if more capacity is required. Maybe even Frigate could sell the system to raise some money.

Describe the solution you'd like First, has anyone tried running Frigate on a S905X3 with NN acc. or a A311D already? Second, do the Frigate developers see potential enough in the chipsets to support them? Would the S905X3/A311D NN accelerator and Coral work at the same time?

EDIT: Not advocating for any particular brand, but as example, a device like this would make a very good, quiet and power efficient Frigate system: https://www.khadas.com/vim3

blakeblackshear commented 2 years ago

I'm not opposed to looking into it, but I don't want to get stuck supporting a wide variety of hardware options that aren't popular. Can you research a guide for running tensorflow on these NPUs?

subiol commented 2 years ago

I'm not opposed to looking into it, but I don't want to get stuck supporting a wide variety of hardware options that aren't popular. Can you research a guide for running tensorflow on these NPUs?

Here is a link to download the software for the device I linked https://www.khadas.com/npu-toolkit-vim3 and here is a guide with examples: https://www.cnx-software.com/2020/01/13/getting-started-with-amlogic-npu-on-khadas-vim3-vim3l/ .

What kind of performance in Frigate do you think the 5 TOPS of the device will provide? If you think it is worth it, I would be willing to buy the device for testing.

blakeblackshear commented 2 years ago

The Coral has 4 TOPS. Probably best to look at a broader set of options to see what is the most widely used and available hardware platform. I have seen many different boards with integrated NPU/TPU/GPU acceleration.

subiol commented 2 years ago

That makes sense. What other chipsets do you think we should consider? I know mostly about Amlogic, Allwiner and Rockchip from looking for a media center SBD, but I'll be happy to look around if you point me to other chipset makers.

blakeblackshear commented 2 years ago

I haven't looked at anything specific. I just know I have seen a few others, and I don't want to invest time into the wrong platform.

subiol commented 2 years ago

What standards does frigate need specifically?

Is it Tensorflow for the NN part and ffmpeg with hardware support (ideally) for the encoding/decoding? Anything else?

blakeblackshear commented 2 years ago

The NN part needs to support models generated with tensorflow. Any required libraries need to support running on Debian in docker.

For decoding/encoding, ffmpeg acceleration for h264 is ideal. Many chips say they support hwaccel, but not with a direct ffmpeg command.

subiol commented 2 years ago

The Amlogic A311D device can run armbian (Debian for ARM) and docker. According to the link I gave, it can run Tensorflow models. It has Linux video hardware acceleration, although I am unsure about ffmpeg hardware acceleration support. The CPU is also more powerful than the rPI 4b while consuming less power (A311D is 12nm).

I am going to ask the producer and in user forums about it to confirm both Tensorflow models support of the 5 TOPS NN and ffmpeg hardware acceleration support.

I am still open to suggestions for other chipsets and devices. But if nothing else is suggested and the previously mentioned capabilities are confirmed, I think I'll buy a Vim3 to give it a try as Frigate system server.

The device is inexpensive enough and also it is a good moment for me to do this because I need to replace my aging media centre SBC. I was going to buy a Odroid C4, because it fits my media centre needs as well as the Vim3 and it is cheaper. But for the testing I'll get the Vim3 and if it turns out there are unsurmountable issues to act as Frigate server, I'll just use it as a slightly more expensive media centre.

I am an amateur programmer (including python) and have used and tinkered with Linux for more than a decade. I have no experience at all programming linux Kernel or drivers, but I can compile kernels and troubleshoot effectively. Hopefully, that will be enough.

blakeblackshear commented 2 years ago

There are two things to focus on:

Get a model from the tensorflow model zoo to work with whatever SDK they provide inside a docker container with python.
Get ffmpeg hardware acceleration working inside the container. I use this command as a starting point and look at CPU usage with different options:
```
ffmpeg -re -stream_loop -1 -i https://streams.videolan.org/ffmpeg/incoming/720p60.mp4 -f rawvideo -pix_fmt yuv420p pipe: > /dev/null
```

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jagheterfredrik commented 2 years ago

@subiol did you get anywhere with the VIM3?

tomeuv commented 11 months ago

Just wanted to mention that I am soon going to look at Frigate working well on boards with that NPU. Those interested can follow this work at https://blog.tomeuvizoso.net/.

tomeuv commented 10 months ago

Does anybody has a tflite model suitable for testing Frigate on A311D NPU? It should have real uint8 quantization.

alucryd commented 8 months ago

I've also got a vim 3 that I'd like to use as my dedicated frigate device instead of my main server. I will keep using my Intel ncs2 with it for the time being, but I will be able to run some tests on the included NPU if this gets anywhere.

tomeuv commented 8 months ago

Just an update: https://blog.tomeuvizoso.net/2024/01/etnaviv-npu-update-14-object-detection.html

TL;DR: I got my driver to run SSDLite MobileDet at 56.149ms per inference. The blob does so at half the time, so there is still quite some room for improvement. I know of some low-hanging fruit that should get us pretty close.

I will be focusing on getting the userspace merged into Mesa so this will be available in common Linux distros such as Debian. The last two remaining kernel patches are already queued for Linux 6.9.

After this first upstreaming round, I plan to work on S905X3 support, and performance improvements.

tomeuv commented 8 months ago

Does anybody has a tflite model suitable for testing Frigate on A311D NPU? It should have real uint8 quantization.

Just for the record, I ended up using ssdlite_mobiledet_coco_qat_postprocess.tflite from https://github.com/google-coral/test_data

kokroo commented 5 months ago

@tomeuv Any progress? Thanks for working on this, looks awesome!

tomeuv commented 5 months ago

@tomeuv Any progress? Thanks for working on this, looks awesome!

This is working with decent performance on the A311D. Everything needed will be in Linux v6.9 and Mesa 24.1.1, so if your distro picks them up, you should be ready to go except for one thing: you need to make sure that the npu node in the device tree for your board is enabled, see eg. https://github.com/mykhani/device-tree-guide/blob/master/README.md#enablingdisabling-a-device

I'm currently working on S905D3 support, but it's no small undertaking.

You can follow the progress at https://blog.tomeuvizoso.net/

sousmangoosta commented 4 months ago

Hi,

I created an interface for frigate to communicate with libadla : https://github.com/sousmangoosta/libadla_interface And a detector plugin for frigate : https://github.com/blakeblackshear/frigate/compare/dev...sousmangoosta:frigate:adla_detector

The docker image provided contains a ssdlite_mobiledet model converted for VIM4 "PRODUCT_PID0XA003".

The inference speed is currently about 20ms.

I'm on Armbian with 5.15.119 kernel

blakeblackshear / frigate

Support for Amlogic S905X3 and A311D with NN accelerator #2955