Bytelake / Coral-in-LXC

How to pass or share a Google Coral M.2 to an LXC container in Proxmox
21 stars 2 forks source link

getting ValueError: Failed to load delegate from libedgetpu.so.1.0 #1

Closed DushanthaS closed 1 year ago

DushanthaS commented 1 year ago

Followed your guide and installed everything, I am getting the following error, and the frigate is crashing.

2023-04-23 14:13:35.377352906 [2023-04-23 14:13:22] frigate.detectors.plugins.edgetpu_tfl INFO : Attempting to load TPU as pci:0 2023-04-23 14:13:35.377590706 Process detector:coral: 2023-04-23 14:13:35.377719405 [2023-04-23 14:13:35] frigate.detectors.plugins.edgetpu_tfl ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors. 2023-04-23 14:13:35.403316706 Traceback (most recent call last): 2023-04-23 14:13:35.403321637 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate 2023-04-23 14:13:35.403322841 delegate = Delegate(library, options) 2023-04-23 14:13:35.403349218 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in init 2023-04-23 14:13:35.403350884 raise ValueError(capture.message) 2023-04-23 14:13:35.403351895 ValueError 2023-04-23 14:13:35.403352820
2023-04-23 14:13:35.403354000 During handling of the above exception, another exception occurred: 2023-04-23 14:13:35.403357991
2023-04-23 14:13:35.403359072 Traceback (most recent call last): 2023-04-23 14:13:35.403506237 File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap 2023-04-23 14:13:35.403507799 self.run() 2023-04-23 14:13:35.403509095 File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run 2023-04-23 14:13:35.403510225 self._target(*self._args, **self._kwargs) 2023-04-23 14:13:35.403511389 File "/opt/frigate/frigate/object_detection.py", line 98, in run_detector 2023-04-23 14:13:35.403512632 object_detector = LocalObjectDetector(detector_config=detector_config) 2023-04-23 14:13:35.403514374 File "/opt/frigate/frigate/object_detection.py", line 52, in init 2023-04-23 14:13:35.403516471 self.detect_api = create_detector(detector_config) 2023-04-23 14:13:35.403517682 File "/opt/frigate/frigate/detectors/init.py", line 24, in create_detector 2023-04-23 14:13:35.403519189 return api(detector_config) 2023-04-23 14:13:35.403520434 File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 37, in init 2023-04-23 14:13:35.403539562 edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config) 2023-04-23 14:13:35.403541011 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate 2023-04-23 14:13:35.403559932 raise ValueError('Failed to load delegate from {}\n{}'.format( 2023-04-23 14:13:35.403561265 ValueError: Failed to load delegate from libedgetpu.so.1.0 2023-04-23 14:13:35.403562309
2023-04-23 14:13:42.497007034 [2023-04-23 14:13:42] frigate.watchdog INFO : Detection appears to have stopped. Exiting Frigate...

LXC config


arch: amd64
cores: 4
features: nesting=1
hostname: frigate2
memory: 1024
mp0: STORE:vm-109-disk-0,mp=/mnt/STORE/,size=8G
net0: name=eth0,bridge=vmbr0,firewall=1,gw=10.0.18.1,hwaddr=AE:7F:40:F1:51:CA,ip=10.0.18.225/24,type=veth
ostype: debian
rootfs: VM:vm-109-disk-0,size=8G
swap: 512
lxc.cgroup2.devices.allow: c 226:0 rwm
lxc.cgroup2.devices.allow: c 226:128 rwm
lxc.cgroup2.devices.allow: c 29:0 rwm
lxc.cgroup2.devices.allow: c 189:* rwm
lxc.apparmor.profile: unconfined
lxc.cgroup2.devices.allow: a
lxc.mount.entry: /dev/dri/renderD128 dev/dri/renderD128 none bind,optional,create=file 0, 0
lxc.mount.entry: /dev/apex_0 dev/apex_0 none bind,optional,create=file 0, 0
lxc.cap.drop:
lxc.mount.auto: cgroup:rw
Bytelake commented 1 year ago

Interesting. I might've forgotten to add a step, but I'm not sure. Do the drivers show up in the LXC? So:

ls /dev/apex_0

I'll follow my own guide in a bit when I have the time to make sure I didn't miss anything, but I suspect the drivers didn't install correctly, or get passed to the Coral correctly.

Bytelake commented 1 year ago

Also, your LXC config is correct, so it shouldn't be that. Maybe try reinstalling the drivers and rebooting everything.

DushanthaS commented 1 year ago

Thanks for the response, yes I rebooted the proxmox and its showing up everything as it should be. root@pve:~# ls /dev/apex_0 /dev/apex_0 root@pve:~# lspci -nn | grep 089a 01:00.0 System peripheral [0880]: Global Unichip Corp. Coral Edge TPU [1ac1:089a]

Bytelake commented 1 year ago

Great! Not sure if I made it clear enough, but yes, the entire system (node) needs to be rebooted for it to load properly.

DushanthaS commented 1 year ago

Sorry for the confusion, actually it's giving the same error even after the reboot. I followed multiple guides but was not sure how to resolve this.

Bytelake commented 1 year ago

So to clarify, ls /dev/apex_0 and lspci return what they're supposed to within the LXC? But the error still happens?

DushanthaS commented 1 year ago

yes that's correct, I am using G650-04528-01 with Ableconn PEX-MP117 Mini PCI-E to PCI-E Adapter Card, and everything showing up as it should be but still getting libedgetpu.so.1.0 error. I tried both frigate and Coral sample project.

Bytelake commented 1 year ago

That's pretty odd. The only thing I can think of is that maybe you missed a step somewhere along the way, but I'm not sure what that could've been. The drivers seem to be installed correctly and the card appears to be recognized, the LXC config is correct, so I'm just not sure.

DushanthaS commented 1 year ago

Yes , thanks for the help, maybe my motherboard is too old or unsupported / I got a faulty TPU or something else, I will try this on another PC when I have time.

Bytelake commented 1 year ago

I just followed the guide on a new system, and it worked first try, no issues.

scollk commented 1 year ago

I seem to be having the same issue. Coral m2 passed through to a Debian LXC. I can see /etc/apex_0 while on the LXC container.

The Coral works while on the host itself and the 'parrot' example code works as expected.

Thoughts - I originally had this as an unprivileged container, and only just changed it in the .conf to be privileged.

I also ran through the coral driver install on the LXC before I did it on the host, due to following some different instructions - not sure if that would matter.

scollk commented 1 year ago

Did some more digging and got this to work.

I had to backup, and then restore my LXC as a privileged container. After that, I removed some redundant permissioning from my config file which was previously used for a mount point. After this we are working.

Thanks for the good write up