blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
19.05k stars 1.74k forks source link

Second PCI device of Dual Edge TPU m.2 card visible in container but not detected in Frigate #1428

Closed pdecat closed 3 years ago

pdecat commented 3 years ago

Describe the bug Second PCI device of Dual Edge TPU m.2 card visible in container but not detected in Frigate.

Version of frigate Output from /api/version: 0.8.4-5043040

Config file

detectors:
  coral0:
    type: edgetpu
    device: pci:0
  coral1:
    type: edgetpu
    device: pci:1

Frigate container logs

 * Starting nginx nginx
   ...done.
frigate.app                    INFO    : Creating directory: /tmp/cache
Starting migrations
peewee_migrate                 INFO    : Starting migrations
There is nothing to migrate
peewee_migrate                 INFO    : There is nothing to migrate
frigate.mqtt                   INFO    : MQTT connected
frigate.app                    INFO    : Camera processor started for camera0: 43
frigate.app                    INFO    : Camera processor started for camera1: 44
frigate.app                    INFO    : Camera processor started for camera2: 47
frigate.app                    INFO    : Capture process started for camera0: 49
frigate.app                    INFO    : Capture process started for camera1: 51
frigate.app                    INFO    : Capture process started for camera2: 53
detector.coral1                INFO    : Starting detection process: 38
frigate.edgetpu                INFO    : Attempting to load TPU as pci:1
frigate.edgetpu                INFO    : No EdgeTPU detected.
Process detector:coral1:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 152, in load_delegate
    delegate = Delegate(library, options)
  File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 111, in __init__
    raise ValueError(capture.message)
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/frigate/frigate/edgetpu.py", line 124, in run_detector
    object_detector = LocalObjectDetector(tf_device=tf_device, num_threads=num_threads)
  File "/opt/frigate/frigate/edgetpu.py", line 63, in __init__
    edge_tpu_delegate = load_delegate('libedgetpu.so.1.0', device_config)
  File "/usr/local/lib/python3.8/dist-packages/tflite_runtime/interpreter.py", line 154, in load_delegate
    raise ValueError('Failed to load delegate from {}\n{}'.format(
ValueError: Failed to load delegate from libedgetpu.so.1.0
detector.coral0                INFO    : Starting detection process: 37
frigate.edgetpu                INFO    : Attempting to load TPU as pci:0
frigate.edgetpu                INFO    : TPU found
frigate.app                    INFO    : Stopping...
frigate.object_processing      INFO    : Exiting object processor...

Frigate stats N/A (app stops during startup)

FFprobe from your camera N/A

Screenshots N/A

Computer Hardware

Camera Info: N/A

Additional context

Both core are visible in Frigate container:

root@hassio:~# docker exec -ti addon_ccab4aaf_frigate sh -c "ls -l /sys/devices/pci0000\:00/0000\:00\:0*/apex/"
'/sys/devices/pci0000:00/0000:00:0c.0/apex/':
total 0
drwxr-xr-x 3 root root 0 Jul 23 19:05 apex_0

'/sys/devices/pci0000:00/0000:00:0d.0/apex/':
total 0
drwxr-xr-x 3 root root 0 Jul 23 19:05 apex_1
root@hassio:~# docker exec -ti addon_ccab4aaf_frigate sh -c "ls -l /dev/apex*"
crw-rw---- 1 root root 120, 0 Jul 23 18:29 /dev/apex_0
crw-rw---- 1 root root 120, 1 Jul 23 18:29 /dev/apex_1

The first core works fine with just:

detectors:
  coral0:
    type: edgetpu
    device: pci:0
  # coral1:
  #   type: edgetpu
  #   device: pci:1

Note: after commenting out the second core, the following error message is repetitively output in Home Assistant logs by the integration:

2021-07-23 19:08:59 ERROR (MainThread) [homeassistant] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 134, in _handle_refresh_interval
    await self._async_refresh(log_failures=True, scheduled=True)
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 265, in _async_refresh
    update_callback()
  File "/usr/src/homeassistant/homeassistant/helpers/update_coordinator.py", line 325, in _handle_coordinator_update
    self.async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 419, in async_write_ha_state
    self._async_write_ha_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 453, in _async_write_ha_state
    state = self._stringify_state()
  File "/usr/src/homeassistant/homeassistant/helpers/entity.py", line 425, in _stringify_state
    state = self.state
  File "/config/custom_components/frigate/sensor.py", line 149, in state
    self.coordinator.data["detectors"][self.detector_name][
KeyError: 'coral1'
pdecat commented 3 years ago

All right, disabling Home Assistant supervisor's Protection mode resolves the issue as described in https://github.com/blakeblackshear/frigate-hass-addons/issues/23 :

 * Starting nginx nginx
   ...done.
frigate.app                    INFO    : Creating directory: /tmp/cache
Starting migrations
peewee_migrate                 INFO    : Starting migrations
There is nothing to migrate
peewee_migrate                 INFO    : There is nothing to migrate
frigate.mqtt                   INFO    : MQTT connected
frigate.app                    INFO    : Camera processor started for camera0: 42
frigate.app                    INFO    : Camera processor started for camera1: 45
frigate.app                    INFO    : Camera processor started for camera2: 46
frigate.app                    INFO    : Capture process started for camera0: 48
frigate.app                    INFO    : Capture process started for camera1: 49
frigate.app                    INFO    : Capture process started for camera2: 51
detector.coral1                INFO    : Starting detection process: 39
frigate.edgetpu                INFO    : Attempting to load TPU as pci:1
detector.coral0                INFO    : Starting detection process: 38
frigate.edgetpu                INFO    : TPU found
frigate.edgetpu                INFO    : Attempting to load TPU as pci:0
frigate.edgetpu                INFO    : TPU found

PR https://github.com/blakeblackshear/frigate-hass-addons/pull/24 should also resolve the issue.

pdecat commented 3 years ago

FWIW, this is with:

See https://github.com/magic-blue-smoke/Dual-Edge-TPU-Adapter/issues/3#issuecomment-885755446 for other details.

KillahB33 commented 3 years ago

@pdecat can you link where you got the e key to m.2 adapter?? I am only seeing one of my two tpus.

pdecat commented 3 years ago

@pdecat can you link where you got the e key to m.2 adapter?? I am only seeing one of my two tpus.

Got my prototype from https://github.com/magic-blue-smoke/Dual-Edge-TPU-Adapter/issues/2

KillahB33 commented 3 years ago

That's huge! Thanks so much, hopefully I can get one soon.

Garrobo01 commented 3 years ago

cool to see it works with two cards! I am trying to add PCIe in addition to USB. how did you pass the PCIe device in docker?

KillahB33 commented 3 years ago

cool to see it works with two cards! I am trying to add PCIe in addition to USB. how did you pass the PCIe device in docker?

Two cards is not the cool piece here, it's that he got both TPUs from a single card and the project he linked is what's making that achievable as it wasn't before. apex_0 (and any other numbers you have) is what you want to pass, I don't know the path off the top of my head but it's mentioned in the setup guide.

Garrobo01 commented 3 years ago

Thanks for the tip! It works now by adding the apex_0

pdecat commented 3 years ago

Disabling Home Assistant supervisor's Protection mode is no longer needed to get two TPU devices working since: