blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
19.2k stars 1.76k forks source link

[Detector Support]: v13, "Received nan values from distance function" #9742

Closed hamishfagg closed 7 months ago

hamishfagg commented 9 months ago

Describe the problem you are having

Upgraded from 0.12.x to 0.13.1 and followed the steps in the release notes. I'm using tensorRT (recreated the model after upgrade) with a quadro k620. I'm getting the below errors in the logs, and cameras initially display an image, then one by one drop off to show "no frames have been received, check error logs"

Version

0.13.1-34fb1c2

Frigate config file

mqtt:
  host: mosquitto
  user: homeassistant
  password: REDACTED

ffmpeg:
  hwaccel_args: preset-nvidia-h264
  output_args:
    record: preset-record-generic-audio-copy

go2rtc:
  streams:
    printer_cam: ffmpeg:http://192.168.0.11:5050?action=stream#video=h264#hardware # <- use hardware acceleration to create an h264 stream usable for other components.

objects:
  track:
    - person
    - car
    - dog
    - cat

record:
  enabled: True
  retain:
    days: 10

snapshots:
  enabled: True
  bounding_box: True
  retain:
    default: 10

detect:
  width: 1280
  height: 720
  fps: 5

cameras:
  front_door:
    ffmpeg:
      inputs:
        - path: rtsp://admin:{FRIGATE_RTSP_PASSWORD}@192.168.0.205/cam/realmonitor?channel=1&subtype=0
          roles:
            - detect
            - record
    motion:
      mask:
        - 0,0,0,720,1280,720,1280,529,696,302,1102,42,1280,80,1280,0

  alley:
    ffmpeg:
      inputs:
        - path: rtsp://admin:{FRIGATE_RTSP_PASSWORD}@192.168.0.201/Streaming/Channels/101/
          roles:
            - detect
            - record
    motion:
      mask:
        - 601,0,1280,0,1280,207,607,138
        - 1016,720,1280,720,1280,0,1189,0,1194,213
        - 648,317,751,318,750,220,644,218
    zones:
      gate:
        coordinates: 1201,330,1210,142,977,112,955,329
        objects:
          - person

  back_yard:
    ffmpeg:
      inputs:
        - path: rtsp://admin:{FRIGATE_RTSP_PASSWORD}@192.168.0.200/Streaming/Channels/101/
          roles:
            - detect
            - record
    motion:
      mask:
        - 1280,0,1280,473,428,149,408,46,331,37,0,45,0,0

  garage:
    ffmpeg:
      inputs:
        - path: rtsp://admin:{FRIGATE_RTSP_PASSWORD}@192.168.0.204/cam/realmonitor?channel=1&subtype=0
          roles:
            - detect
            - record
    objects:
      filters:
        car:
          mask:
            - 1253,259,1280,0,0,0,0,217,638,191
        person:
          mask:
            - 1280,0,1280,310,565,236,0,194,0,0
    motion:
      mask:
        - 1253,259,1280,0,0,0,0,217,638,191
    zones:
      driveway:
        coordinates: 0,720,1280,720,1280,258,490,156,0,394
        objects:
          - person

  printer:
    ffmpeg:
      inputs:
        - path: rtsp://127.0.0.1:8554/printer_cam
          roles:
            - detect

detectors:
  tensorrt:
    type: tensorrt

model:
  path: /config/model_cache/tensorrt/yolov4-tiny-416.trt
  input_tensor: nchw
  input_pixel_format: rgb
  width: 416
  height: 416

docker-compose file or Docker CLI command

docker run
  -d
  --name='frigate'
  --net='home'
  -e TZ="REDACTED"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="REDACTED"
  -e HOST_CONTAINERNAME="frigate"
  -e 'FRIGATE_RTSP_PASSWORD'='REDACTED'
  -e 'PLUS_API_KEY'='REDACTED'
  -e 'NVIDIA_DRIVER_CAPABILITIES'='all'
  -e 'YOLO_MODELS'='yolov4-tiny-416'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:5000'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/blakeblackshear/frigate/master/web/public/favicon.png'
  -p '8554:8554/tcp'
  -v '/mnt/disks/redacted/frigate/':'/media/frigate':'rw,slave'
  -v '/mnt/user/appdata/frigate/':'/config/':'rw'
  --shm-size=256m
  --gpus=all 'ghcr.io/blakeblackshear/frigate:stable-tensorrt'

Relevant log output

s6-rc: info: service s6rc-fdholder: starting
s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service s6rc-fdholder successfully started
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service trt-model-prepare: starting
s6-rc: info: service log-prepare: starting
No models to convert.
s6-rc: info: service log-prepare successfully started
s6-rc: info: service nginx-log: starting
s6-rc: info: service go2rtc-log: starting
s6-rc: info: service frigate-log: starting
s6-rc: info: service trt-model-prepare successfully started
s6-rc: info: service nginx-log successfully started
s6-rc: info: service frigate-log successfully started
s6-rc: info: service go2rtc-log successfully started
s6-rc: info: service go2rtc: starting
s6-rc: info: service go2rtc successfully started
s6-rc: info: service go2rtc-healthcheck: starting
s6-rc: info: service frigate: starting
s6-rc: info: service go2rtc-healthcheck successfully started
s6-rc: info: service frigate successfully started
s6-rc: info: service nginx: starting
s6-rc: info: service nginx successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
2024-02-08 04:25:00.037431973  [INFO] Starting NGINX...
2024-02-08 04:25:00.037420533  [INFO] Preparing Frigate...
2024-02-08 04:25:00.109804080  [INFO] Preparing new go2rtc config...
2024-02-08 04:25:00.109896303  [INFO] Starting Frigate...
2024-02-08 04:25:00.560675256  [INFO] Starting go2rtc...
2024-02-08 04:25:00.614820503  17:25:00.614 INF go2rtc version 1.8.4 linux/amd64
2024-02-08 04:25:00.615150611  17:25:00.615 INF [rtsp] listen addr=:8554
2024-02-08 04:25:00.615186702  17:25:00.615 INF [api] listen addr=:1984
2024-02-08 04:25:00.615220413  17:25:00.615 INF [webrtc] listen addr=:8555
2024-02-08 04:25:00.713083196  2024/02/08 17:25:00 [error] 153#153: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 172.22.0.28, server: , request: "GET /ws HTTP/1.1", upstream: "http://127.0.0.1:5002/", host: "cam.home.lan"
2024-02-08 04:25:00.713133237  172.22.0.28 - - [08/Feb/2024:17:25:00 +1300] "GET /ws HTTP/1.1" 502 157 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:122.0) Gecko/20100101 Firefox/122.0" "192.168.0.136"
2024-02-08 04:25:01.041420363  2024/02/08 17:25:01 [error] 154#154: *3 connect() failed (111: Connection refused) while connecting to upstream, client: 172.22.0.39, server: , request: "GET /api/stats HTTP/1.1", upstream: "http://127.0.0.1:5001stats", host: "frigate:5000"
2024-02-08 04:25:01.950561063  [2024-02-08 17:25:01] frigate.app                    INFO    : Starting Frigate (0.13.1-34fb1c2)
2024-02-08 04:25:01.950605534  [2024-02-08 17:25:01] frigate.app                    INFO    : Creating directory: /tmp/cache
2024-02-08 04:25:02.059778790  [2024-02-08 17:25:02] peewee_migrate.logs            INFO    : Starting migrations
2024-02-08 04:25:02.062840684  [2024-02-08 17:25:02] peewee_migrate.logs            INFO    : There is nothing to migrate
2024-02-08 04:25:02.066836550  [2024-02-08 17:25:02] frigate.app                    INFO    : Recording process started: 1028
2024-02-08 04:25:02.068055090  [2024-02-08 17:25:02] frigate.app                    INFO    : go2rtc process pid: 95
2024-02-08 04:25:02.091251810  [2024-02-08 17:25:02] detector.tensorrt              INFO    : Starting detection process: 1038
2024-02-08 04:25:02.340136489  [2024-02-08 17:25:02] frigate.app                    INFO    : Output process started: 1039
2024-02-08 04:25:02.340362764  [2024-02-08 17:25:02] frigate.app                    INFO    : Camera processor started for front_door: 1048
2024-02-08 04:25:02.340415295  [2024-02-08 17:25:02] frigate.app                    INFO    : Camera processor started for alley: 1050
2024-02-08 04:25:02.340464157  [2024-02-08 17:25:02] frigate.app                    INFO    : Camera processor started for back_yard: 1051
2024-02-08 04:25:02.340512178  [2024-02-08 17:25:02] frigate.app                    INFO    : Camera processor started for garage: 1066
2024-02-08 04:25:02.340559819  [2024-02-08 17:25:02] frigate.app                    INFO    : Camera processor started for printer: 1070
2024-02-08 04:25:02.340605430  [2024-02-08 17:25:02] frigate.app                    INFO    : Capture process started for front_door: 1073
2024-02-08 04:25:02.340651271  [2024-02-08 17:25:02] frigate.app                    INFO    : Capture process started for alley: 1076
2024-02-08 04:25:02.340701212  [2024-02-08 17:25:02] frigate.app                    INFO    : Capture process started for back_yard: 1080
2024-02-08 04:25:02.340742393  [2024-02-08 17:25:02] frigate.app                    INFO    : Capture process started for garage: 1085
2024-02-08 04:25:02.340790635  [2024-02-08 17:25:02] frigate.app                    INFO    : Capture process started for printer: 1090
2024-02-08 04:25:02.396965891  [2024-02-08 17:25:02] frigate.detectors.plugins.tensorrt INFO    : Loaded engine size: 35 MiB
2024-02-08 04:25:02.529360207  [2024-02-08 17:25:02] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +25, now: CPU 156, GPU 88 (MiB)
2024-02-08 04:25:02.536039789  [2024-02-08 17:25:02] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 158, GPU 98 (MiB)
2024-02-08 04:25:02.537502664  [2024-02-08 17:25:02] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +35, now: CPU 0, GPU 35 (MiB)
2024-02-08 04:25:02.632087507  [2024-02-08 17:25:02] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 122, GPU 90 (MiB)
2024-02-08 04:25:02.633683096  [2024-02-08 17:25:02] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 122, GPU 98 (MiB)
2024-02-08 04:25:02.633756848  [2024-02-08 17:25:02] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +15, now: CPU 0, GPU 50 (MiB)
2024-02-08 04:25:07.801313883  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:07.801321374    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:07.826290526  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:07.826292926    distance[0] /= estimate_dim[0]
2024-02-08 04:25:07.826321037  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:07.826322187    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:07.898632433  /opt/frigate/frigate/track/norfair_tracker.py:41: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:07.898636193    distance[1] /= estimate_dim[1]
2024-02-08 04:25:07.898665024  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:07.898666444    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:07.898691365  /opt/frigate/frigate/track/norfair_tracker.py:48: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:07.898692875    height_ratio = heights[1] / heights[0] - 1.0
2024-02-08 04:25:07.972857005  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:07.972859625    distance[0] /= estimate_dim[0]
2024-02-08 04:25:08.080520405  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:08.080523245    distance[0] /= estimate_dim[0]
2024-02-08 04:25:08.270611324  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:08.270613644    distance[0] /= estimate_dim[0]
2024-02-08 04:25:08.270630195  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:08.270631015    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:08.274653592  
2024-02-08 04:25:08.274655242  Received nan values from distance function, please check your distance function 
2024-02-08 04:25:08.274655722  for errors!
2024-02-08 04:25:09.696225955  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:09.696228385    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:09.696238775  /opt/frigate/frigate/track/norfair_tracker.py:48: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:09.696239505    height_ratio = heights[1] / heights[0] - 1.0
2024-02-08 04:25:09.910817676  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:09.910821066    distance[0] /= estimate_dim[0]
2024-02-08 04:25:09.910822316  /opt/frigate/frigate/track/norfair_tracker.py:41: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:09.910825796    distance[1] /= estimate_dim[1]
2024-02-08 04:25:10.023166419  [INFO] Starting go2rtc healthcheck service...
2024-02-08 04:25:13.310035309  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:13.310037869    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:13.453032991  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:13.453035051    distance[0] /= estimate_dim[0]
2024-02-08 04:25:13.453045172  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:13.453045972    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:13.453061872  /opt/frigate/frigate/track/norfair_tracker.py:48: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:13.453062812    height_ratio = heights[1] / heights[0] - 1.0
2024-02-08 04:25:13.456185327  
2024-02-08 04:25:13.456186577  Received nan values from distance function, please check your distance function 
2024-02-08 04:25:13.456187057  for errors!
2024-02-08 04:25:13.554075841  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: divide by zero encountered in double_scalars
2024-02-08 04:25:13.554079281    distance[0] /= estimate_dim[0]
2024-02-08 04:25:13.790179672  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:13.790182282    distance[0] /= estimate_dim[0]
2024-02-08 04:25:13.790204673  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:13.790205813    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:13.792892707  
2024-02-08 04:25:13.792894737  Received nan values from distance function, please check your distance function 
2024-02-08 04:25:13.792895657  for errors!
2024-02-08 04:25:14.097973933  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:14.097976573    distance[0] /= estimate_dim[0]
2024-02-08 04:25:14.097986864  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:14.097987594    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:14.100453233  
2024-02-08 04:25:14.100454903  Received nan values from distance function, please check your distance function 
2024-02-08 04:25:14.100455503  for errors!
2024-02-08 04:25:17.210757252  /opt/frigate/frigate/track/norfair_tracker.py:39: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:17.210759552    distance[0] /= estimate_dim[0]
2024-02-08 04:25:17.210770332  /opt/frigate/frigate/track/norfair_tracker.py:47: RuntimeWarning: invalid value encountered in double_scalars
2024-02-08 04:25:17.210770942    width_ratio = widths[1] / widths[0] - 1.0
2024-02-08 04:25:17.213482597  
2024-02-08 04:25:17.213484177  Received nan values from distance function, please check your distance function 
2024-02-08 04:25:17.213484747  for errors!
2024-02-08 04:26:17.075405387  [2024-02-08 17:26:17] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for alley. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:17.076172106  [2024-02-08 17:26:17] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for back_yard. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:17.077045677  [2024-02-08 17:26:17] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for garage. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:17.077685302  [2024-02-08 17:26:17] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for printer. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:27.076033013  [2024-02-08 17:26:27] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for alley. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:27.076754680  [2024-02-08 17:26:27] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for front_door. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:27.077397616  [2024-02-08 17:26:27] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for garage. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:27.078123353  [2024-02-08 17:26:27] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for back_yard. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:27.078872641  [2024-02-08 17:26:27] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for printer. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:37.077410045  [2024-02-08 17:26:37] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for printer. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:37.077531168  [2024-02-08 17:26:37] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for alley. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:37.078568663  [2024-02-08 17:26:37] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for front_door. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:37.079310891  [2024-02-08 17:26:37] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for back_yard. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:37.080353746  [2024-02-08 17:26:37] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for garage. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:47.076273545  [2024-02-08 17:26:47] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for printer. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:47.076358747  [2024-02-08 17:26:47] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for alley. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:47.077419093  [2024-02-08 17:26:47] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for back_yard. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:47.078323715  [2024-02-08 17:26:47] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for garage. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:47.079209027  [2024-02-08 17:26:47] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for front_door. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:57.076038856  [2024-02-08 17:26:57] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for printer. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:57.076110408  [2024-02-08 17:26:57] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for alley. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:57.076852626  [2024-02-08 17:26:57] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for front_door. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:57.077504361  [2024-02-08 17:26:57] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for back_yard. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-08 04:26:57.078242149  [2024-02-08 17:26:57] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for garage. Keeping the 6 most recent segments out of 7 and discarding the rest...

Operating system

UNRAID

Install method

Docker CLI

Coral version

Other

Any other information that may be helpful

image (4)

NickM-27 commented 9 months ago

We've not seen any other reports of this, the only way this would happen would be if the model was returning incorrect or unexpected values. Can you try regenerating the model or perhaps using a different one and see if it occurs

hamishfagg commented 9 months ago

Hi, I've tried re-generating yolov4-tiny-288, yolov4-tiny-416, yolov7-tiny-288, yolov7-tiny-416 models several times and haven't had any success - I've also tried using USE_FP16=false but that gives me keyerrors:

2024-02-08 20:29:18.559414197  Process camera_processor:back_yard:
2024-02-08 20:29:18.559939998  Traceback (most recent call last):
2024-02-08 20:29:18.559953569    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-02-08 20:29:18.559954229      self.run()
2024-02-08 20:29:18.559954829    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-02-08 20:29:18.559955249      self._target(*self._args, **self._kwargs)
2024-02-08 20:29:18.559958519    File "/opt/frigate/frigate/video.py", line 436, in track_camera
2024-02-08 20:29:18.559958959      process_frames(
2024-02-08 20:29:18.559959369    File "/opt/frigate/frigate/video.py", line 689, in process_frames
2024-02-08 20:29:18.559959729      detect(
2024-02-08 20:29:18.559961909    File "/opt/frigate/frigate/video.py", line 474, in detect
2024-02-08 20:29:18.559962339      region_detections = object_detector.detect(tensor_input)
2024-02-08 20:29:18.559962799    File "/opt/frigate/frigate/object_detection.py", line 225, in detect
2024-02-08 20:29:18.559963219      (self.labels[int(d[0])], float(d[1]), (d[2], d[3], d[4], d[5]))
2024-02-08 20:29:18.559971419  KeyError: -11
NickM-27 commented 9 months ago

can you show the output of nvidia-smi

hamishfagg commented 9 months ago

Sure thing, here it is:

image
hamishfagg commented 9 months ago

I seem to have the same issue as here: #8329 - I get keyerrors if I use USE_FP16=false which apparently I need to use for my card. There is one person with an apparent solution at the end of that issue but I don't know enough about tensorRT to recreate what they did without more info.

BunpGhost commented 9 months ago

I to am having the same issue. I have a quadro k1200 Also using USE_FP16=false .

BunpGhost commented 9 months ago

I to am having the same issue. I have a quadro k1200 Also using USE_FP16=false .

So, I belive I did all the update requirements correctly. I updated from 0.12 to 0.13, read all the breaking changes and prepared for it. I run frigate on proxmox host in a LXC with docker, only for it. docker compose was updated correctly, included the environment vars for USE_FP16 and YOLOV MODELS . Generated a couple of models, no issues there. First run migrated the DB and created the models. The cameras appeared to be working but then, live feed stopped and the logs are like the above in @hamishfagg post. Tried different combinations and the thing that gets me the far is using yolov4-tiny-416 model instead of yolov7-tiny-416 that I was using in 0.12. However, with this one, it appears it doesn't detect anything, although debug view and bounding boxes appear. When using this model, I get the Received nan values from distance function, so, my guess is something in the model generation that is failing. Also tried to using 0.12 tensorRT.sh to create the models but using 23.03 image but with no luck :( I'm kind of lost here... Any help @NickM-27 ?? Thanks

NickM-27 commented 9 months ago

I don't know why this would be happening. There were no reports during the beta / RC and it is not clear why something would be doing this unless the model was returning incorrect coordinates. Seeing your config would be a good first step. Maybe @NateMeyer has an idea

BunpGhost commented 9 months ago

I don't know why this would be happening. There were no reports during the beta / RC and it is not clear why something would be doing this unless the model was returning incorrect coordinates. Seeing your config would be a good first step. Maybe @NateMeyer has an idea

Sure, here's the config:

logger:
  # Optional: default log level (default: shown below)
  default: info

mqtt:
  enabled: true
  host: homeassistant
  user: mqtt
  password: REDACTED

birdseye:
  enabled: True
  mode: continuous
  restream: True
  quality: 15

detectors:
  tensorrt:
    type: tensorrt
    device: 0 #This is the default, select the first GPU

model:
  path: /config/model_cache/tensorrt/yolov7-tiny-416.trt
  input_tensor: nchw
  input_pixel_format: rgb
  width: 416
  height: 416

ffmpeg:
  hwaccel_args: preset-nvidia-h264
  output_args:
        record: preset-record-generic-audio-aac
  input_args: preset-rtsp-restream

go2rtc:
  streams:
    adelaidecam: 
      - rtsp://REDACTED@192.168.0.122:554/stream1 # <- stream which supports video & aac audio
      - "ffmpeg:adelaidecam#audio=aac" # <- copy of the stream which transcodes audio to the missing codec (usually will be opus)
    adelaidecam_sub: 
      - rtsp://REDACTED@192.168.0.122:554/stream2
    webrtc:
      candidates:
        - 192.168.0.153:8555
        - stun:8555
cameras:
  adelaidecam:
    ffmpeg:
      inputs:
        - path: rtsp://127.0.0.1:8554/adelaidecam
          roles:
            - record
        - path: rtsp://127.0.0.1:8554/adelaidecam_sub
          roles:
            - detect
    objects:
      track:
        - person
    snapshots:
      enabled: True
      clean_copy: True
      timestamp: false
      bounding_box: True
      crop: False

record:
  enabled: True
  retain:
    days: 3
    mode: motion
  events:
    retain:
      default: 30
      mode: motion

I have 5 cameras but, at the moment, I'm testing version 0.13 on cloned LXC from version 0.12 and using only 1 camera and minimum settings to focus on the current issue.

This is the docker-compose:

version: "3.9"

services:
  frigate:
    container_name: frigate
    privileged: true
    restart: unless-stopped
    image: ghcr.io/blakeblackshear/frigate:stable-tensorrt
    runtime: nvidia
    deploy:    # <------------- Add this section
      resources:
        reservations:
          devices:
            - driver: nvidia
              device_ids: ['0'] # this is only needed when using multiple GPUs
              capabilities: [gpu]
    shm_size: "256mb"
    volumes:
      - /etc/localtime:/etc/localtime:ro
      - /frigate/config/:/config/
      - /frigateData:/media/frigate
      - type: tmpfs # 1GB of memory
        target: /tmp/cache
        tmpfs:
          size: 1000000000
    ports:
      - "5000:5000" # Port used by the Web UI
      - "8554:8554" # RTSP feeds
      - "8555:8555/tcp" # WebRTC over tcp
      - "8555:8555/udp" # WebRTC over udp
      - "1984:1984"
    environment:
      FRIGATE_RTSP_PASSWORD: "useyourownpassword!"
      USE_FP16: False
      YOLO_MODELS: yolov7x-320,yolov7-320,yolov7-tiny-416,yolov7-tiny-288,yolov4-tiny-416,yolov4-tiny-288

nivida-smi on LXC:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro K1200                   Off | 00000000:01:00.0 Off |                  N/A |
| 61%   75C    P0               2W /  35W |    137MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

nvidia-smi on host:

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro K1200                   On  | 00000000:01:00.0 Off |                  N/A |
| 62%   75C    P0               2W /  35W |    137MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    990023      C   frigate.detector.tensorrt                    94MiB |
|    0   N/A  N/A    990043      C   ffmpeg                                       38MiB |
+---------------------------------------------------------------------------------------+

The log with the error, when using yolo7-tiny-416.trt:

2024-02-13 08:51:19.863620690  [INFO] Preparing Frigate...
2024-02-13 08:51:19.893291142  [INFO] Starting Frigate...
2024-02-13 08:51:23.825680984  [2024-02-13 08:51:23] frigate.app                    INFO    : Starting Frigate (0.13.1-34fb1c2)
2024-02-13 08:51:26.298183165  [2024-02-13 08:51:26] peewee_migrate.logs            INFO    : Starting migrations
2024-02-13 08:51:26.328795915  [2024-02-13 08:51:26] peewee_migrate.logs            INFO    : There is nothing to migrate
2024-02-13 08:51:26.338584236  [2024-02-13 08:51:26] frigate.app                    INFO    : Recording process started: 442
2024-02-13 08:51:26.341439016  [2024-02-13 08:51:26] frigate.app                    INFO    : go2rtc process pid: 109
2024-02-13 08:51:26.370399689  [2024-02-13 08:51:26] frigate.app                    INFO    : Output process started: 453
2024-02-13 08:51:26.400168785  [2024-02-13 08:51:26] frigate.app                    INFO    : Camera processor started for adelaidecam: 460
2024-02-13 08:51:26.400171835  [2024-02-13 08:51:26] frigate.app                    INFO    : Capture process started for adelaidecam: 462
2024-02-13 08:51:26.436859532  [2024-02-13 08:51:26] detector.tensorrt              INFO    : Starting detection process: 452
2024-02-13 08:51:26.459985714  [2024-02-13 08:51:26] frigate.detectors.plugins.tensorrt INFO    : Loaded engine size: 34 MiB
2024-02-13 08:51:26.575882563  [2024-02-13 08:51:26] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +6, GPU +8, now: CPU 150, GPU 72 (MiB)
2024-02-13 08:51:26.580074488  [2024-02-13 08:51:26] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 151, GPU 82 (MiB)
2024-02-13 08:51:26.593988153  [2024-02-13 08:51:26] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +34, now: CPU 0, GPU 34 (MiB)
2024-02-13 08:51:26.595868736  [2024-02-13 08:51:26] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 117, GPU 74 (MiB)
2024-02-13 08:51:26.596107894  [2024-02-13 08:51:26] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 117, GPU 82 (MiB)
2024-02-13 08:51:26.596242602  [2024-02-13 08:51:26] frigate.detectors.plugins.tensorrt INFO    : [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +13, now: CPU 0, GPU 47 (MiB)
2024-02-13 08:51:31.575526010  Process camera_processor:adelaidecam:
2024-02-13 08:51:31.603301451  Traceback (most recent call last):
2024-02-13 08:51:31.603304441    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-02-13 08:51:31.603305491      self.run()
2024-02-13 08:51:31.603306612    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-02-13 08:51:31.603308485      self._target(*self._args, **self._kwargs)
2024-02-13 08:51:31.603309540    File "/opt/frigate/frigate/video.py", line 436, in track_camera
2024-02-13 08:51:31.603330489      process_frames(
2024-02-13 08:51:31.603331714    File "/opt/frigate/frigate/video.py", line 689, in process_frames
2024-02-13 08:51:31.603333010      detect(
2024-02-13 08:51:31.603334100    File "/opt/frigate/frigate/video.py", line 474, in detect
2024-02-13 08:51:31.603335161      region_detections = object_detector.detect(tensor_input)
2024-02-13 08:51:31.603336200    File "/opt/frigate/frigate/object_detection.py", line 225, in detect
2024-02-13 08:51:31.603354979      (self.labels[int(d[0])], float(d[1]), (d[2], d[3], d[4], d[5]))
2024-02-13 08:51:31.603376856  KeyError: -15
2024-02-13 08:52:41.351035847  [2024-02-13 08:52:41] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for adelaidecam. Keeping the 6 most recent segments out of 7 and discarding the rest...
2024-02-13 08:52:51.351495453  [2024-02-13 08:52:51] frigate.record.maintainer      WARNING : Unable to keep up with recording segments in cache for adelaidecam. Keeping the 6 most recent segments out of 7 and discarding the rest...
...

with yolov4-tiny-416 everything seems to work, however it seems it doesn't detect anything.. and if I add the rest of the cameras, I start having the Received nan values from distance function

Does this info help? Also asking @NateMeyer for help :)

NickM-27 commented 9 months ago

what is your drive version? this is a typical error that means the card has an out of date driver

BunpGhost commented 9 months ago

what is your drive version? this is a typical error that means the card has an out of date driver

Thanks for the reply. It is on the nvidi-ami print:Driver Version: 535.146.02, should I update?

NateMeyer commented 9 months ago

I think that driver version is ok.

It's weird we're just seeing these issues with the Kx2 "Maxwell" Quadro cards. I wonder if there are issues with the compute 5.0 cards in this version of TensorRT? We might have to post something on the NVidia forums.

NickM-27 commented 9 months ago

That seems to be the consensus, compute 5.0 specifically are having issues

NateMeyer commented 9 months ago

TensorRT 8.5.3 claims to support compute 5.0, but I don't have one of those cards to test with. Do we know of anyone with a Maxwell GPU that is running 0.13 successfully?

BunpGhost commented 9 months ago

I was going through other issues in here and I think the one mentioned above is very much related https://github.com/blakeblackshear/frigate/issues/8329#issuecomment-1807249026

The issue was staled but I think we should revive it. What do you guys think? maybe @kdill00 or @qubex22 can help?

BunpGhost commented 8 months ago

Any luck on this? :(

qubex22 commented 8 months ago

I was going through other issues in here and I think the one mentioned above is very much related

https://github.com/blakeblackshear/frigate/issues/8329#issuecomment-1807249026

The issue was staled but I think we should revive it.

What do you guys think?

maybe @kdill00 or @qubex22 can help?

I ended up buying a second hand P600. The only sure thing is that there's a problem with Maxwell cards.

BunpGhost commented 8 months ago

Well.. I can't afford another Graphics card.. So, I ended up installing codeProject.AI for this and it works. Here is some detail: I have a quadro P1200 that was working fine with frigate tensorRT 0.12 runninc on a LXC in proxmox I installed codeproject.AI in another LXC (Ubuntu) with latest nvidia driver BUT I installed cuda-tollkit 11.7. Now I'm using frigate 0.13 but the detector is codeproject.Ai. I'm using yolo5 tiny model. My inferece went up from 10ms to 33ms. I haven't optimized anything yet .. so, I guess it is a good tradeoff for now. Hopefully, someone will figure out what is wrong with the tensorRT images for these GPUs and fix it.

hamishfagg commented 8 months ago

I also gave in and bought a new GPU, a T1000. Everything is working fine with it, so I won't be able to test any fixes here. But I'll leave this open

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

akatheduelist commented 6 months ago

I am having the exact same issue with a Maxwell GPU (K2200). Just like the issue lays out, Yolov4-tiny seems to be the only model that doesn't crash, but it only shows motion and no object detection. Other models will either have Key errors, divide by zero errors, NaN values from distance function errors etc... I was wondering what the consensus is with the devs on this issue before I troubleshoot further. Should compute 5.0 Maxwell GPU owners look to upgrade hardware at this point or is it worth looking into? I have 2 K2200s in my machine and would be willing to send one to a dev if it is worth looking into at all.

NickM-27 commented 6 months ago

this should be fixed in the next version, based on the linked PR above

akatheduelist commented 6 months ago

this should be fixed in the next version, based on the linked PR above

Oh awesome! I didn't see that. I will look out for that build and test it out.

NateMeyer commented 6 months ago

this should be fixed in the next version, based on the linked PR above

Oh awesome! I didn't see that. I will look out for that build and test it out.

Also, I wrote up a workaround for v.13 if you want to give that a shot. https://gist.github.com/NateMeyer/a689b4462e57b3de0ebcc40e6538fc03