google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
431 stars 125 forks source link

USB Coral seems to lockup, requires unplug, replug #766

Open Whytey opened 1 year ago

Whytey commented 1 year ago

Description

Using Frigate NVR in docker that is within an LXC container. It has been running solid for some time.

This last two weeks I have found Frigate has crashed and the cause has been a lockup in the Coral. The only way to restore working behaviour is to unplug the Coral, reconnect it then restart Frigate (twice, because of a lsusb change of ID).

Frigate had been up and running for multiple days, no physical interaction with the Coral, then:

2023-06-08 19:40:48.026239815  [2023-06-08 19:40:48] frigate.detectors.plugins.edgetpu_tfl ERROR   : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors.
2023-06-08 19:40:48.157209504  Traceback (most recent call last):
2023-06-08 19:40:48.157217644    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate
2023-06-08 19:40:48.157220398      delegate = Delegate(library, options)
2023-06-08 19:40:48.157223338    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__
2023-06-08 19:40:48.157225708      raise ValueError(capture.message)
2023-06-08 19:40:48.157263474  ValueError
2023-06-08 19:40:48.157265868  
2023-06-08 19:40:48.157268442  During handling of the above exception, another exception occurred:
2023-06-08 19:40:48.157270468  
2023-06-08 19:40:48.157272742  Traceback (most recent call last):
2023-06-08 19:40:48.157324336    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2023-06-08 19:40:48.157329166      self.run()
2023-06-08 19:40:48.157333244    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2023-06-08 19:40:48.157336588      self._target(*self._args, **self._kwargs)
2023-06-08 19:40:48.157340312    File "/opt/frigate/frigate/object_detection.py", line 98, in run_detector
2023-06-08 19:40:48.157344056      object_detector = LocalObjectDetector(detector_config=detector_config)
2023-06-08 19:40:48.157383908    File "/opt/frigate/frigate/object_detection.py", line 52, in __init__
2023-06-08 19:40:48.157387088      self.detect_api = create_detector(detector_config)
2023-06-08 19:40:48.157389714    File "/opt/frigate/frigate/detectors/__init__.py", line 24, in create_detector
2023-06-08 19:40:48.157391910      return api(detector_config)
2023-06-08 19:40:48.157394584    File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 37, in __init__
2023-06-08 19:40:48.157397236      edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config)
2023-06-08 19:40:48.157399986    File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate
2023-06-08 19:40:48.157402560      raise ValueError('Failed to load delegate from {}\n{}'.format(
2023-06-08 19:40:48.157405036  ValueError: Failed to load delegate from libedgetpu.so.1.0

Repetitive output in syslog:

Jun  8 19:38:20 server1 kernel: [9336563.291864] usb 2-1.4: reset high-speed USB device number 89 using ehci-pci
Jun  8 19:38:20 server1 kernel: [9336563.387828] usb 2-1.4: device descriptor read/64, error -71
Jun  8 19:38:20 server1 kernel: [9336563.587849] usb 2-1.4: device descriptor read/64, error -71

lsusb on the guest before replugging coral

root@frigate:~# lsusb
Bus 002 Device 088: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 006: ID 0624:0249 Avocent Corp. Virtual Keyboard/Mouse
Bus 001 Device 005: ID 413c:a001 Dell Computer Corp. Hub
Bus 001 Device 004: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Bus 001 Device 003: ID 0480:a00c Toshiba America Inc 
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

lsusb on the guest after replugging coral

root@frigate:~# lsusb
Bus 002 Device 010: ID 1a6e:089a Global Unichip Corp. 
Bus 002 Device 088: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 006: ID 0624:0249 Avocent Corp. Virtual Keyboard/Mouse
Bus 001 Device 005: ID 413c:a001 Dell Computer Corp. Hub
Bus 001 Device 004: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Bus 001 Device 003: ID 0480:a00c Toshiba America Inc 
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

lsusb after restarting frigate:

root@frigate:~# lsusb
Bus 002 Device 011: ID 18d1:9302 Google Inc. 
Bus 002 Device 088: ID 0764:0501 Cyber Power System, Inc. CP1500 AVR UPS
Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 006: ID 0624:0249 Avocent Corp. Virtual Keyboard/Mouse
Bus 001 Device 005: ID 413c:a001 Dell Computer Corp. Hub
Bus 001 Device 004: ID 10c4:ea60 Silicon Labs CP210x UART Bridge
Bus 001 Device 003: ID 0480:a00c Toshiba America Inc 
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Frigate will now run OK for a week or more...

Is there anything I can do to get Coral running reliably again?

Click to expand! ### Issue Type Bug ### Operating System Ubuntu ### Coral Device USB Accelerator ### Other Devices _No response_ ### Programming Language Python 3.8 ### Relevant Log Output ```shell 2023-06-08 19:40:48.026239815 [2023-06-08 19:40:48] frigate.detectors.plugins.edgetpu_tfl ERROR : No EdgeTPU was detected. If you do not have a Coral device yet, you must configure CPU detectors. 2023-06-08 19:40:48.157209504 Traceback (most recent call last): 2023-06-08 19:40:48.157217644 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 160, in load_delegate 2023-06-08 19:40:48.157220398 delegate = Delegate(library, options) 2023-06-08 19:40:48.157223338 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 119, in __init__ 2023-06-08 19:40:48.157225708 raise ValueError(capture.message) 2023-06-08 19:40:48.157263474 ValueError 2023-06-08 19:40:48.157265868 2023-06-08 19:40:48.157268442 During handling of the above exception, another exception occurred: 2023-06-08 19:40:48.157270468 2023-06-08 19:40:48.157272742 Traceback (most recent call last): 2023-06-08 19:40:48.157324336 File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap 2023-06-08 19:40:48.157329166 self.run() 2023-06-08 19:40:48.157333244 File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run 2023-06-08 19:40:48.157336588 self._target(*self._args, **self._kwargs) 2023-06-08 19:40:48.157340312 File "/opt/frigate/frigate/object_detection.py", line 98, in run_detector 2023-06-08 19:40:48.157344056 object_detector = LocalObjectDetector(detector_config=detector_config) 2023-06-08 19:40:48.157383908 File "/opt/frigate/frigate/object_detection.py", line 52, in __init__ 2023-06-08 19:40:48.157387088 self.detect_api = create_detector(detector_config) 2023-06-08 19:40:48.157389714 File "/opt/frigate/frigate/detectors/__init__.py", line 24, in create_detector 2023-06-08 19:40:48.157391910 return api(detector_config) 2023-06-08 19:40:48.157394584 File "/opt/frigate/frigate/detectors/plugins/edgetpu_tfl.py", line 37, in __init__ 2023-06-08 19:40:48.157397236 edge_tpu_delegate = load_delegate("libedgetpu.so.1.0", device_config) 2023-06-08 19:40:48.157399986 File "/usr/lib/python3/dist-packages/tflite_runtime/interpreter.py", line 162, in load_delegate 2023-06-08 19:40:48.157402560 raise ValueError('Failed to load delegate from {}\n{}'.format( 2023-06-08 19:40:48.157405036 ValueError: Failed to load delegate from libedgetpu.so.1.0 ```
Whytey commented 1 year ago

FYI: I have a mild suspicion that the provided USB cable may have been causing this problem.

easy-and-simple commented 1 year ago

And I suspect coral overheating