blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
19.08k stars 1.74k forks source link

[HW Accel Support]: Using Coral TPU mPCIe Crashes my machine on start #6578

Closed hmakmur closed 1 year ago

hmakmur commented 1 year ago

Describe the problem you are having

I just got a new Coral TPU mPCIe installed on the motherboard in place of the WiFI card. My system is Ubuntu 20.04. Coral PCI is tested passed example code from google. It also visible on Frigate.

python3 examples/classify_image.py --model test_data/mobilenet_v2_1.0_224_inat_bird_quant_edgetpu.tflite --labels test_data/inat_bird_labels.txt --input test_data/parrot.jpg
----INFERENCE TIME----
Note: The first inference on Edge TPU is slow because it includes loading the model into Edge TPU memory.
12.8ms
2.8ms
2.9ms
2.8ms
2.9ms
-------RESULTS--------
Ara macao (Scarlet Macaw): 0.75781

When running with CPU as detector, everything is OK.

The problem started when running Coral PCI as detector, the whole machine crashes when docker started. This crashes only happens when detect: enabled: true When detection is not enable, docker will work just fine but no detection is done.

I can not get logs on this setting as there are no logs because the who system crashed. I have logs for enable: false as attached below.

Why is my whole Ubuntu 20.04 system spontaneously crash when detection is enable? How do I debug this to get your some more logs?

I waited years to get the Coral TPU.
I added mPCIe version because it seems to be documented now and USB version is just too expensive and has a long wait. I want to add more cameras and I figure mPCIe version would work just ok. Now that I got it, it seems to be crashing. Help!

Thanks

Version

0.12.0-DA3E197

Frigate config file

mqtt:
  host: 192.168.110.2
  port: 1883

detectors:
    coralpci:
      type: edgetpu
      device: pci

database:
   path: /media/frigate/frigate12.db

detect:
  enabled: false
  width: 1280                    
  height: 720                    
  fps: 5                         

cameras:
  FrontdoorCam:
    ffmpeg:
      inputs:
        - path: "rtsp://{FRIGATE_RTSP_USER}:{FRIGATE_RTSP_PASSWORD}@192.168.11.15/live"
          roles:
            - detect
    objects: 
       track:
         - person

docker-compose file or Docker CLI command

docker run -d \
  --name frigate12 \
  --shm-size=256m \
  --mount type=tmpfs,target=/tmp/cache,tmpfs-size=1000000000 \
  --device /dev/bus/usb:/dev/bus/usb \
  --device  /dev/apex_0:/dev/apex_0 \
  --device /dev/dri/renderD128 \
  -v /home/temp/cameras/frigate:/media/frigate \
  -v /etc/frigate/config.yml:/config/config.yml:rw \
  -v /etc/localtime:/etc/localtime:ro \
  -e FRIGATE_RTSP_USER \
  -e FRIGATE_RTSP_PASSWORD \
  -p 8082:5000 \
  -p 1935:1935 \
    -p 8556:8555 \
    -p 8555:8555/tcp \
    -p 8555:8555/udp \
 ghcr.io/blakeblackshear/frigate:stable

Relevant log output

2023-05-22 21:31:56.087249887  [INFO] Starting Frigate...
2023-05-22 21:31:57.261304175  [2023-05-22 21:31:57] frigate.app                    INFO    : Starting Frigate (0.12.0-da3e197)
2023-05-22 21:31:57.316469116  [2023-05-22 21:31:57] peewee_migrate                 INFO    : Starting migrations
2023-05-22 21:31:57.395624752  [2023-05-22 21:31:57] peewee_migrate                 INFO    : There is nothing to migrate
2023-05-22 21:31:57.429752583  [2023-05-22 21:31:57] detector.coralpci              INFO    : Starting detection process: 432
2023-05-22 21:31:57.637474852  [2023-05-22 21:31:57] frigate.app                    INFO    : Output process started: 434
2023-05-22 21:31:57.777099795  [2023-05-22 21:31:57] frigate.detectors.plugins.edgetpu_tfl INFO    : Attempting to load TPU as pci
2023-05-22 21:31:59.575347246  [2023-05-22 21:31:57] frigate.app                    INFO    : Camera processor started for FrontdoorCam: 437
2023-05-22 21:31:59.575559418  [2023-05-22 21:31:57] frigate.detectors.plugins.edgetpu_tfl INFO    : TPU found
2023-05-22 21:32:00.187332787  [2023-05-22 21:31:57] frigate.app                    INFO    : Capture process started for FrontdoorCam: 439

FFprobe output from your camera

[{"return_code":0,"stderr":"","stdout":{"programs":[],"streams":[{"avg_frame_rate":"20/1","codec_long_name":"H.264/AVC/MPEG-4AVC/MPEG-4part10","height":1080,"width":1920},{"avg_frame_rate":"0/0","bit_rate":"128000","codec_long_name":"PCMA-law/G.711A-law"}]}}]

Operating system

Other Linux

Install method

Docker CLI

Network connection

Wired

Camera make and model

WyzeCam v3 with RTSP

Any other information that may be helpful

Here is the System info when detection is not enable.

SYSTEM 0 12 0-DA3E197

NickM-27 commented 1 year ago

Do you have any host logs for this? There's not a whole lot that can be recommended here based on frigate logs

hmakmur commented 1 year ago

Unfortunately no log on the host as the machine goes boom spontaneously as frigate starts to detect.

NickM-27 commented 1 year ago

Most systems have ways to log to a file so the logs can be viewed in the event of a host crash

hmakmur commented 1 year ago

I decided to setup a syslog server and send a replicate of all my syslog to another machine. This yield one single line that cause the crash. I am not exactly sure why I get this error message.

May 24 21:32:02 server kernel: [174351.438649] x86/PAT: frigate.detecto:1723860 map pfn RAM range req uncached
-minus for [mem 0x2d1480000-0x2d1483fff], got write-back

Apparently, this is not a new issue. I tried solution from edgetpu 345 but still have the same issue. The machine still crashes when detection is enabled.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

markmghali commented 3 months ago

@hmakmur having a similar issue on my unraid machine. did you ever find a solution?

hmakmur commented 3 months ago

@hmakmur having a similar issue on my unraid machine. did you ever find a solution?

My machine can start now after cutting pin on the card as described in 232 but frigate crashes daily due to Coral TPU error as described in 345. For some time, I was able to get it to work quite reliably until a kernel 5.15.0-117 update on Jul 25, 2024 when EdgeTPU get this error: RAM did not enable within timeout (12000 ms) daily. I dont know if this is a kernel issue but I intend to roll back and see.