blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
18.58k stars 1.69k forks source link

[Detector Support]: Coral USB on Ubuntu LTS 22.04.3 Server Frigate in Docker takes multiple restarts to detect TPU #9663

Closed RutgerDiehard closed 7 months ago

RutgerDiehard commented 8 months ago

Describe the problem you are having

Running Frigate 0.13.1 on Ubuntu Server 22.04.3 LTS in Docker, I've had an issue with a Coral USB TPU getting stuck multiple times a day. After a bit of troubleshooting (moving USB ports etc.) I have found using a powered USB hub the most reliable solution; it stays up for days at a time.

The issue I have now, is that when I restart Frigate (after config changes) Frigate does not immediately find the TPU, it takes 3 or 4 restarts before it actually finds the TPU and starts successfully.

I've tested this on a different PC running Frigate 0.12 in Docker on Ubuntu Server 20.04.6 LTS with the same results. When the Coral TPU is plugged in to a USB port on the PC, Frigate finds the device after the first restart every time.

Is there something I'm missing, or is there something I can do (maybe add a delay) to help?

Hardware:

Dell Vostro 3470 Intel i5 8400 16GB RAM 240GB M.2 SSD

Version

0.13.1

Frigate config file

mqtt:
  # Required: host name
  host: HOSTIP
  # Optional: port (default: shown below)
  port: 1883
  # Optional: topic prefix (default: shown below)
  # WARNING: must be unique if you are running multiple instances
  topic_prefix: frigate
  # Optional: client id (default: shown below)
  # WARNING: must be unique if you are running multiple instances
  client_id: frigate
  # Optional: user
  user: mqtt
  # Optional: password
  # NOTE: Environment variables that begin with 'FRIGATE_' may be referenced in {}.
  #       eg. password: '{FRIGATE_MQTT_PASSWORD}'
  password: password
  # Optional: interval in seconds for publishing stats (default: shown below)
  stats_interval: 60
detectors:
  # Required: name of the detector
  coral:
    # Required: type of the detector
    # Valid values are 'edgetpu' (requires device property below) and 'cpu'.
    type: edgetpu
    # Optional: device name as defined here: https://coral.ai/docs/edgetpu/multiple-edgetpu/#using-the-tensorflow-lite-python-api
    device: usb
    # Optional: num_threads value passed to the tflite.Interpreter (default: shown below)
    # This value is only used for CPU types
    num_threads: 3
record:
  # Optional: Enable recording (default: shown below)
  enabled: True
  # Optional: Number of days to retain recordings regardless of events (default: shown below)
  # NOTE: This should be set to 0 and retention should be defined in events section below
  #       if you only want to retain recordings of events.
  retain:
    days: 0
  # Optional: Event recording settings
  events:
    # Optional: Maximum length of time to retain video during long events. (default: shown below)
    # NOTE: If an object is being tracked for longer than this amount of time, the retained recordings
    #       will be the last x seconds of the event unless retain_days under record is > 0.
    # max_seconds: 300
    # Optional: Number of seconds before the event to include (default: shown below)
    pre_capture: 5
    # Optional: Number of seconds after the event to include (default: shown below)
    post_capture: 5
    # Optional: Objects to save recordings for. (default: all tracked objects)
    objects:
      - person
    # Optional: Restrict recordings to objects that entered any of the listed zones (default: no required zones)
    required_zones: []
    # Optional: Retention settings for recordings of events
    retain:
      # Required: Default retention days (default: shown below)
      default: 10
      # Optional: Per object retention days
      objects:
        person: 15
go2rtc:
  streams:
#    username: "user"
#    password: "password"
    Drive:
      - "ffmpeg:rtsp://user:password@CAMERAIP:554/h265Preview_01_main"
    Front_Door:
      - "ffmpeg:rtsp://user:password@CAMERAIP:554/h265Preview_01_main"      
    Front_Door_Sub: 
      - http://CAMERAIP/flv?port=1935&app=bcs&stream=channel0_sub.bcs&user=user&password=password
cameras:
  Front_Door:
    ffmpeg:
      hwaccel_args: preset-vaapi
      inputs:
        - path: rtsp://127.0.0.1:8554/Front_Door
          input_args: preset-rtsp-restream
          roles:
            - record
        - path: http://CAMERAIP/flv?port=1935&app=bcs&stream=channel0_ext.bcs&user=user&user=password
          input_args: preset-http-reolink
          roles:
            - detect
    live:
      stream_name: Front_Door_Sub
    detect:
      width: 896
      height: 512
      fps: 5
    objects:
      track:
        - person
        - dog
        - car
      filters:
        dog:
          mask: 471,349,469,439,533,443,534,347
    snapshots:
      enabled: true
      timestamp: false
      bounding_box: true
      retain:
        default: 5
      required_zones:
        - drive_zone
        - drive_parking_zone
        - drive_path_zone
    record:
      events:
        required_zones:
          - drive_zone
          - drive_parking_zone
          - drive_path_zone
    zones:
      drive_zone:
        coordinates: 348,65,50,227,0,85,54,61,91,49,118,0,346,0
        objects:
          - person
          - car
          - dog
      drive_parking_zone:
        coordinates: 72,221,568,402,516,512,846,512,896,314,896,0,360,0,358,59
        objects:
          - person
          - dog
      drive_path_zone:
        coordinates: 0,512,461,512,521,388,46,211,0,107
        objects:
          - person
          - dog
      street_zone:
        coordinates: 2,1,87,0,98,35,50,57,0,78
    motion:
      mask:
        - 638,477,638,500,260,501,261,475
    mqtt:
      timestamp: False
      bounding_box: False
      crop: True
      quality: 100
      height: 500
  Drive:
    ffmpeg:
      hwaccel_args: preset-vaapi
      inputs:
        - path: rtsp://127.0.0.1:8554/Drive
          input_args: preset-rtsp-restream
          roles:
            - record
        - path: http://CAMERAIP/flv?port=1935&app=bcs&stream=channel0_ext.bcs&user=admin&user=password
          input_args: preset-http-reolink
          roles:
            - detect
    detect:
      width: 896
      height: 512
      fps: 5
    objects:
      track:
        - person
        - dog
        - car
    snapshots:
      enabled: true
      timestamp: false
      bounding_box: true
      retain:
        default: 5
      required_zones:
        - drive_zone
        - drive_parking_zone
        - drive_path_zone
    record:
      events:
        required_zones:
          - drive_zone
          - drive_parking_zone
          - drive_path_zone
    zones:
      drive_parking_zone:
        coordinates: 0,512,534,512,684,384,316,132,315,62,0,173
        objects:
          - person
          - dog
      drive_path_zone:
        coordinates: 537,512,800,512,827,450,683,383
        objects:
          - person
          - dog
      drive_zone:
        coordinates: 896,0,896,289,825,447,593,320,320,128,318,65
        objects:
          - person
          - car
          - dog
      close_zone:
        coordinates: 0,235,67,236,296,138,301,0,0,0
    motion:
      mask:
        - 896,0,896,154,659,96,328,53,304,0
        - 638,476,633,499,265,501,265,476
        - 162,117,324,70,314,0,164,0,141,31
        - 128,108,139,79,134,50,89,48,64,45,301,0,21,129,73,109
    mqtt:
      timestamp: False
      bounding_box: False
      crop: True
      quality: 100
      height: 500

docker-compose file or Docker CLI command

version: "3.9"
services:
  frigate:
    container_name: frigate
    privileged: true # this may not be necessary for all setups
    restart: unless-stopped
    image: ghcr.io/blakeblackshear/frigate:stable
    shm_size: "100mb" # update for your cameras based on calculation above
    devices:
      - /dev/bus/usb:/dev/bus/usb # passes the USB Coral, needs to be modified >
#      - /dev/apex_0:/dev/apex_0 # passes a PCIe Coral, follow driver instructi>
      - /dev/dri/renderD128 # for intel hwaccel, needs to be updated for your h>
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /home/bob/frigate/config:/config
      - /home/bob/go2rtc:/config/go2rtc
      - /home/bob/frigate:/media/frigate
      - type: tmpfs # Optional: 1GB of memory, reduces SSD/SD Card wear
        target: /tmp/cache
        tmpfs:
          size: 1000000000
    ports:
      - "1984:1984"
      - "5000:5000"
      - "8123:8123"
      - "8554:8554" # RTSP feeds
      - "8555:8555/tcp" # WebRTC over tcp
      - "8555:8555/udp" # WebRTC over udp
    environment:
      FRIGATE_RTSP_PASSWORD: "password"

Relevant log output

[2024-02-05 15:37:40] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as usb
2024-02-05 15:38:07.318585372  Process detector:coral:
2024-02-05 15:38:07.318635327  [2024-02-05 15:38:07] frigate.detectors.plugins.edgetpu_tfl ERROR   : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
2024-02-05 15:38:07.321163604  Traceback (most recent call last):
2024-02-05 15:38:07.321172313    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
2024-02-05 15:38:07.321177057      delegate = Delegate(library, options)
2024-02-05 15:38:07.321213104    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
2024-02-05 15:38:07.321217245      raise ValueError(capture.message)
2024-02-05 15:38:07.321220896  ValueError
2024-02-05 15:38:07.321224311  
2024-02-05 15:38:07.321228703  During handling of the above exception, another exception occurred:
2024-02-05 15:38:07.321232104  
2024-02-05 15:38:07.321256319  Traceback (most recent call last):
2024-02-05 15:38:07.321280776    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-02-05 15:38:07.321284526      self.run()
2024-02-05 15:38:07.321288957    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-02-05 15:38:07.321293014      self._target(*self._args, **self._kwargs)
2024-02-05 15:38:07.321325952    File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-02-05 15:38:07.321330507      object_detector = LocalObjectDetector(detector_config=detector_config)
2024-02-05 15:38:07.321334961    File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
2024-02-05 15:38:07.321339001      self.detect_api = create_detector(detector_config)
2024-02-05 15:38:07.321343255    File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2024-02-05 15:38:07.321346892      return api(detector_config)
2024-02-05 15:38:07.321351169    File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 41, in __init__
2024-02-05 15:38:07.321360880      edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
2024-02-05 15:38:07.321365908    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
2024-02-05 15:38:07.321417327      raise ValueError('Failed to load delegate from {}\n{}'.format(
2024-02-05 15:38:07.321444242  ValueError: Failed to load delegate from libedgetpu.so.1.0

Operating system

Other Linux

Install method

Docker Compose

Coral version

USB

Any other information that may be helpful

No response

hawkeye217 commented 8 months ago

Back in the early days of Frigate, an underpowered Coral would lock up or crash my entire system. It wasn't until I used a powered USB hub that all of my problems went away.

The only time I've had a similar issue is in Frigate development - specifically when I'm working on some code and it crashes. Sometimes I have to completely unplug my USB Coral and plug it back in. It then takes a little time for the internal driver to load (lsusb shows "Global Unichip Corp" instead of "Google Inc"). I have never seen it in on my production setup, though I've recently moved away from a USB-based Coral to an internal PCI-based one.

So if your "sticking" Coral has disappeared with a powered USB hub, I bet the re-initialization delay probably has to do with the chipset in the USB hub you're using or one of the USB cables. I would first try swapping hubs and/or cables.

RutgerDiehard commented 8 months ago

Thanks @hawkeye217 I suspected the USB hub (TP Link) may be the cause. I had tried a new USB cable to the Coral initially which, in the original PC, didn't really make any difference so I went back to the Coral-supplied cable plugged in to the hub. I've just removed the hub and plugged the Coral directly into the PC with the new USB cable to see if that makes a difference. To confirm though, it restarts every time, first time when not plugged in to the hub.

diggity801 commented 8 months ago

I've also noticed this issue. Frigate fails to detect the first go-around, crashes, and then detects the USB Coral after restarting. It's an issue with Frigate and Google Coral USB.

RutgerDiehard commented 8 months ago

Woke up this morning to this :-(

2024-02-05 17:35:47.504233344  [INFO] Preparing Frigate...
2024-02-05 17:35:47.522570262  [INFO] Starting Frigate...
2024-02-05 17:35:48.481902422  [2024-02-05 17:35:48] frigate.app                    INFO    : Starting Frigate (0.13.1-34fb1c2)
2024-02-05 17:35:48.525608702  [2024-02-05 17:35:48] peewee_migrate.logs            INFO    : Starting migrations
2024-02-05 17:35:48.529037877  [2024-02-05 17:35:48] peewee_migrate.logs            INFO    : There is nothing to migrate
2024-02-05 17:35:48.532026551  [2024-02-05 17:35:48] frigate.app                    INFO    : Recording process started: 368
2024-02-05 17:35:48.534044108  [2024-02-05 17:35:48] frigate.app                    INFO    : go2rtc process pid: 89
2024-02-05 17:35:48.547961389  [2024-02-05 17:35:48] detector.coral                 INFO    : Starting detection process: 378
2024-02-05 17:35:51.191840635  [2024-02-05 17:35:48] frigate.app                    INFO    : Output process started: 380
2024-02-05 17:35:51.197223890  [2024-02-05 17:35:48] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as usb
2024-02-05 17:35:51.197766842  [2024-02-05 17:35:48] frigate.app                    INFO    : Camera processor started for Front_Door: 387
2024-02-05 17:35:51.197861023  [2024-02-05 17:35:48] frigate.app                    INFO    : Camera processor started for Drive: 388
2024-02-05 17:35:51.197945376  [2024-02-05 17:35:51] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found
2024-02-05 17:35:51.197991389  [2024-02-05 17:35:48] frigate.app                    INFO    : Capture process started for Front_Door: 391
2024-02-05 17:35:51.198059465  [2024-02-05 17:35:48] frigate.app                    INFO    : Capture process started for Drive: 394
2024-02-06 02:58:20.075355223  [2024-02-06 02:58:20] frigate.watchdog               INFO    : Detection appears to be stuck. Restarting detection process...
2024-02-06 02:58:20.075650189  [2024-02-06 02:58:20] root                           INFO    : Waiting for detection process to exit gracefully...
2024-02-06 02:58:50.104470446  [2024-02-06 02:58:50] root                           INFO    : Detection process didnt exit. Force killing...
2024-02-06 02:58:50.116969921  [2024-02-06 02:58:50] root                           INFO    : Detection process has exited...
2024-02-06 02:58:50.141654868  [2024-02-06 02:58:50] detector.coral                 INFO    : Starting detection process: 73324
2024-02-06 02:58:52.791030746  [2024-02-06 02:58:50] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as usb
2024-02-06 02:58:52.804029925  [2024-02-06 02:58:52] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found

It lasted over nine hours before crashing. I've ordered a Sabrent powered USB hub which should be here today (good ol' Amazon). I hope this will solve the issue.

povlhp commented 8 months ago

A real USB 3.0 port should be able to supply 2A of power. If your prts are not up to spec, a powered USB 3.0 hub is the solution.

hawkeye217 commented 8 months ago

Woke up this morning to this :-(

2024-02-05 17:35:47.504233344  [INFO] Preparing Frigate...
2024-02-05 17:35:47.522570262  [INFO] Starting Frigate...
2024-02-05 17:35:48.481902422  [2024-02-05 17:35:48] frigate.app                    INFO    : Starting Frigate (0.13.1-34fb1c2)
2024-02-05 17:35:48.525608702  [2024-02-05 17:35:48] peewee_migrate.logs            INFO    : Starting migrations
2024-02-05 17:35:48.529037877  [2024-02-05 17:35:48] peewee_migrate.logs            INFO    : There is nothing to migrate
2024-02-05 17:35:48.532026551  [2024-02-05 17:35:48] frigate.app                    INFO    : Recording process started: 368
2024-02-05 17:35:48.534044108  [2024-02-05 17:35:48] frigate.app                    INFO    : go2rtc process pid: 89
2024-02-05 17:35:48.547961389  [2024-02-05 17:35:48] detector.coral                 INFO    : Starting detection process: 378
2024-02-05 17:35:51.191840635  [2024-02-05 17:35:48] frigate.app                    INFO    : Output process started: 380
2024-02-05 17:35:51.197223890  [2024-02-05 17:35:48] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as usb
2024-02-05 17:35:51.197766842  [2024-02-05 17:35:48] frigate.app                    INFO    : Camera processor started for Front_Door: 387
2024-02-05 17:35:51.197861023  [2024-02-05 17:35:48] frigate.app                    INFO    : Camera processor started for Drive: 388
2024-02-05 17:35:51.197945376  [2024-02-05 17:35:51] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found
2024-02-05 17:35:51.197991389  [2024-02-05 17:35:48] frigate.app                    INFO    : Capture process started for Front_Door: 391
2024-02-05 17:35:51.198059465  [2024-02-05 17:35:48] frigate.app                    INFO    : Capture process started for Drive: 394
2024-02-06 02:58:20.075355223  [2024-02-06 02:58:20] frigate.watchdog               INFO    : Detection appears to be stuck. Restarting detection process...
2024-02-06 02:58:20.075650189  [2024-02-06 02:58:20] root                           INFO    : Waiting for detection process to exit gracefully...
2024-02-06 02:58:50.104470446  [2024-02-06 02:58:50] root                           INFO    : Detection process didnt exit. Force killing...
2024-02-06 02:58:50.116969921  [2024-02-06 02:58:50] root                           INFO    : Detection process has exited...
2024-02-06 02:58:50.141654868  [2024-02-06 02:58:50] detector.coral                 INFO    : Starting detection process: 73324
2024-02-06 02:58:52.791030746  [2024-02-06 02:58:50] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as usb
2024-02-06 02:58:52.804029925  [2024-02-06 02:58:52] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found

It lasted over nine hours before crashing. I've ordered a Sabrent powered USB hub which should be here today (good ol' Amazon). I hope this will solve the issue.

To be clear, the detection process being "stuck" is something different than what I was referring to, which was to the Failed to load delegate from libedgetpu.so.1.0 error on frigate startup. If everything is passed through to Docker correctly, seeing this initial error can sometimes be because the Coral's internal driver is not yet loaded. Then when frigate auto-restarts, the driver is loaded and everything functions normally.

With that said, I think you probably still have a power-related issue that is hanging the Coral.

Frigate can't do anything to improve or fix this. If you continue to have problems, I'd suggest moving toward an internal PCIe or M.2 Coral.

RutgerDiehard commented 8 months ago

New Sabrent powered USB 3 hub installed with new cable to Coral.

I tested a simple Frigate restart from within Home Assistant after Frigate started successfully using the new hub. It took over 11 minutes of constant watchdog restarts for Frigate to finally detect the Coral and start successfully. It must have restarted 20 times with the message Failed to load delegate from libedgetpu.so.1.0

Is this normal for the Coral driver to take this long to load? It kind of makes tweaking settings impractical without plugging directly into the PC USB port.

hawkeye217 commented 8 months ago

That's not something I have experienced. And if it was a bug in Frigate, we'd see many more reports of the issue.

So it must be something specific to your hardware and setup. Beyond buying a new Coral, I'm not sure what else it could be.

povlhp commented 8 months ago

New Sabrent powered USB 3 hub installed with new cable to Coral.

What power-supply ? 3-5A minimum is recommended.

I know it says 4 TOPS aka 2 TOPS/W. And with 5V that is less than 0.5A. But people have Issues with power - so I would make sure it has full 3A USB available.

RutgerDiehard commented 8 months ago

Power Supply states 12V 3.0A 36.0W. With the Coral connected to the new hub, it takes so long for Frigate to find the Coral and start successfully (much, much longer than the TP-Link Hub), I've given up and connected the Coral directly to the other USB 3.0 port on the motherboard with the new cable. At the moment, it's been running without issue for over 12 hours (compared to 9 hours yesterday in the other port before detection stuck).

During testing, after plugging the hub into the other USB 3.0 port (the one the Coral is in now), I did notice when checking the Frigate container logs that the status scrolled through a lot quicker. I wonder if it's a combination of port and cable that's the issue. Anyway, I'll keep an eye on it and see how long it runs for.

I had removed, and recreated the Frigate container but I doubt that has had any significant impact.

RutgerDiehard commented 8 months ago

So far over 36 hours of uptime and no restarts! With a stable platform, what uptime figures are you seeing? Is an occasional container restart to be expected?

povlhp commented 8 months ago

Mine installed monday. One container restart as I went from 4 CPU to 2 CPU and 4->2GB RAM for the LXC container. No hangs/restart aside from that I initiated

RutgerDiehard commented 8 months ago

It lasted way over 72 hours before detection got stuck, restarted and got stuck again several hours later. I've plugged the Coral back in to the TP-Link powered hub with the new cable and plugged the hub in to the other USB 3.0 port.

RutgerDiehard commented 8 months ago

The TP-Link has specific ports for high power devices (up to 1.5A). I've plugged the TPU in to one of these and still get the detection has hung message sometime later. So, I've removed the USB TPU completely and plugged in a brand-new M.2 TPU into the wi-fi card socket on the motherboard. So far, inference has reduced from 11-12ms to 8-9ms and it is running fine.

RutgerDiehard commented 7 months ago

This has been running without issue for quite a few days. Upon restarting Frigate, the M.2 Coral is detected instantly - every time. I think, therefore, this was an issue with either USB port or USB Coral so closing. Many thanks for the input.