5x higher CPU usage moving from stable to 0.6.0-rc1 docker

wjcloudy commented 4 years ago

Not sure where the issue has crept in - but thought I'd try 0.6.0-rc1 docker today instead of the stable release - the CPU usage is a lot higher with no config changes. Is this due to extra features or is there an issue here? I'm using coral TPU

Typical CPU in stable between 7-9%

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 5.2 1.0 3334364 175396 ? Ssl 23:14 0:04 python3.7 -u detect_objects.py root 16 1.2 0.0 422560 7688 ? Sl 23:14 0:01 /usr/local/bin/plasma_store -m 400000000 -s /tmp/plasma root 22 2.9 0.6 1066744 99012 ? Sl 23:14 0:02 python3.7 -u detect_objects.py root 33 2.7 0.2 530696 45156 ? Ssl 23:14 0:02 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 37 3.3 0.2 531672 45460 ? Ssl 23:15 0:02 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 41 3.5 0.2 531432 45276 ? Ssl 23:15 0:02 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 66 2.8 0.2 531540 44672 ? Ssl 23:15 0:01 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 68 0.5 0.5 2205564 95784 ? S 23:15 0:00 python3.7 -u detect_objects.py root 69 0.3 0.5 2205832 89160 ? S 23:15 0:00 python3.7 -u detect_objects.py root 70 1.4 0.6 2596488 107392 ? S 23:15 0:00 python3.7 -u detect_objects.py root 71 0.2 0.5 2596488 95720 ? S 23:15 0:00 python3.7 -u detect_objects.py root 93 0.0 0.0 4628 852 pts/0 Ss+ 23:15 0:00 sh root 107 0.0 0.0 4628 824 pts/1 Ss 23:16 0:00 sh root 112 0.0 0.0 34400 2876 pts/1 R+ 23:16 0:00 ps aux

CPU in 0.6.0-rc1 is around 40-50% with no config changes

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 53.3 4.0 3806812 669892 ? Ssl 23:17 0:18 python3.7 -u detect_objects.py root 19 7.6 0.0 423464 11720 ? Sl 23:17 0:02 /usr/local/bin/plasma_store -m 400000000 -s /tmp/plasma root 25 1.1 0.6 1201828 99696 ? Sl 23:17 0:00 python3.7 -u detect_objects.py root 36 29.4 0.2 528988 42304 ? Rsl 23:17 0:07 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 40 3.1 0.2 531608 46676 ? Ssl 23:17 0:00 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 44 3.4 0.2 531624 45024 ? Ssl 23:17 0:00 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 71 2.9 0.2 531576 45412 ? Rsl 23:17 0:00 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 73 43.1 2.7 2475320 445840 ? Rl 23:17 0:06 python3.7 -u detect_objects.py root 74 0.3 0.6 2474824 111092 ? Sl 23:17 0:00 python3.7 -u detect_objects.py root 75 1.1 0.7 2865480 115084 ? Sl 23:17 0:00 python3.7 -u detect_objects.py root 76 0.3 0.5 2474824 94396 ? Sl 23:17 0:00 python3.7 -u detect_objects.py root 104 0.0 0.0 4628 784 pts/0 Ss 23:17 0:00 sh root 110 0.0 0.0 34400 2788 pts/0 R+ 23:17 0:00 ps aux

blakeblackshear commented 4 years ago

Did you configure zones or enable save clips?

JonGilmore commented 4 years ago

fwiw, i did not enable zones, but did enable saving of clips and i'm seeing normal load.

wjcloudy commented 4 years ago

No zones added, and I had no config entries for save clips, I set clips to false for all cameras in case and doesn't seem to have made any difference...

blakeblackshear commented 4 years ago

One of your ffmpeg processes has much higher CPU utilization in the new version. Can you try removing that one camera from your config?

wjcloudy commented 4 years ago

Sure, any way to work out which? Would that also explain the increase in the one python thread?

blakeblackshear commented 4 years ago

It might. Look at the /debug/stats endpoint and match up the ffmpeg pid.

wjcloudy commented 4 years ago

I think I took the stats too quickly, re-upgraded and have re run a few times, all the ffmpeg processes are about 15% each with the PID1 detect.py using 90% CPU (I guess it shows CPU for all it's processes). The one of the 4 ffmpegs using slightly less CPU (PID 48) is one of 3 identical cameras with the same settings, so not even sure why that one would be different.

Also more obviously the FPS for the affected CPU cameras is now at something like 107fps. The house camera is still at a normal fps...

Bad:

"garden": { "camera_fps": 105.2, "detection_fps": 0, "ffmpeg_pid": 60, "frame_info": { "detect": 1596965886.536944, "process": 1596965886.530548, "read": 1596965886.536944 }, "pid": 65, "process_fps": 103.6, "read_start": 1596965886.541309, "skipped_fps": 103.6 },

Good

"house": { "camera_fps": 15, "detection_fps": 7.8, "ffmpeg_pid": 48, "frame_info": { "detect": 1596965886.415483, "process": 1596965886.415483, "read": 1596965886.540069 }, "pid": 64, "process_fps": 7.6, "read_start": 1596965886.42873, "skipped_fps": 7.5

USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 89.7 1.3 3873116 224668 ? Ssl 10:34 1:34 python3.7 -u detect_objects.py root 19 8.7 0.0 422560 14724 ? Sl 10:34 0:08 /usr/local/bin/plasma_store -m 400000000 -s /tmp/plasma root 25 2.2 0.6 1201824 104240 ? Sl 10:34 0:02 python3.7 -u detect_objects.py root 36 15.6 0.2 530316 47296 ? Rsl 10:34 0:14 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 40 16.9 0.2 530596 46212 ? Rsl 10:34 0:15 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 48 3.4 0.2 531988 45112 ? Ssl 10:34 0:03 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 60 16.3 0.2 531412 44248 ? Rsl 10:34 0:14 ffmpeg -hide_banner -loglevel panic -avoid_negative_ts make root 62 16.6 0.5 1479464 91452 ? Sl 10:34 0:14 python3.7 -u detect_objects.py root 63 14.1 0.5 1479732 92732 ? Rl 10:34 0:12 python3.7 -u detect_objects.py root 64 1.3 0.6 1870388 103176 ? Sl 10:34 0:01 python3.7 -u detect_objects.py root 65 13.2 0.5 1479732 92664 ? Sl 10:34 0:11 python3.7 -u detect_objects.py root 108 0.0 0.0 4628 776 pts/0 Ss 10:34 0:00 sh root 131 0.0 0.0 34400 2908 pts/0 R+ 10:36 0:00 ps aux

wjcloudy commented 4 years ago

OK I did find a tiny difference in the camera config. The camera that was unaffected by the upgrade had keyframe set to every 30 frames (VBR) - One of the identical problem cameras had keyframes every 90 frames (also VBR). Changing this to 30 to match the other took the CPU right down and the fps counter back to normal However I have another model of camera affected yet is set to keyframe every 15 frames (CBR). All my cameras are at 15fps

EDIT: After some reboots the 'fixed' camera has gone back to being a problem. Only one camera works every time. Also the cameras with the dud fps are frozen when you view them on the debug stream. The time counter progresses but the image is frozen. If change the input of all frigate cameras to the good camera the CPU is back to normal and they all work, so it's something it doesn't like on the other streams (on 2 models of camera!), but it's absolutely fine on the stable version. The good camera has a slightly different version of firmware - so may explain why it's stream is behaving differently

wjcloudy commented 4 years ago

Here is the working stream: goodstream

Here are bad streams from two different cameras - note the FPS is missing badstream badstream2

wjcloudy commented 4 years ago

OK traced the problem back to commit remove vsync drop because it breaks segment

https://github.com/blakeblackshear/frigate/commit/fbe721c860f5e99bcc60791b4c3f1d06a60445bf

Re-adding the vsync -drop has fixed the CPU and the fps - however I guess at the expense of whatever this was intended to fix....

blakeblackshear commented 4 years ago

Can you post your full config?

wjcloudy commented 4 years ago

Below is the working config, I've set global ffmpeg args so I can include the vsync drop

configpost.txt

`

blakeblackshear commented 4 years ago

You can post code blocks surrounded ``` to format properly. docs

For some reason ffmpeg thinks the frame rate of your cameras is way higher than 15. Try removing -vsync drop and setting the frame rate for each camera in the input args with -r 15.

Example

      input_args:
        - -avoid_negative_ts
        - make_zero
        - -fflags
        - nobuffer
        - -flags
        - low_delay
        - -strict
        - experimental
        - -fflags
        - +genpts+discardcorrupt
        - -r
        - '15'
        - -rtsp_transport
        - tcp
        - -stimeout
        - '5000000'
        - -use_wallclock_as_timestamps
        - '1'

wjcloudy commented 4 years ago

Can confirm this also works, many thanks. Given as 3 out of my 4 cameras had this issue, could it be worth adding a note into the config sample highlighting a need for input fps to be read automatically or manually set in these newer versions? Thanks again!

blakeblackshear commented 4 years ago

I made a note. I plan to add it as a config option, but clarify that it should only be set when the FPS isn't determined correctly by frigate.

jonnyrider commented 4 years ago

I can confirm that this fix works with one camera, but when multiple cameras are used, the frame rate doesn't seem to be fixed for them all:

Below the kitchen, living room and Claudia cameras are maintaining the correct FPS, but the front and back cameras aren't - the front camera is a Foscam C1, same as the kitchen.

wjcloudy commented 4 years ago

I had 3 of 4 cameras with the issue, and they are all now OK - can you post your config? are you using -r or -vsync drop ?

jonnyrider commented 4 years ago

web_port: 5000

mqtt:
  host: 192.168.1.3
  topic_prefix: frigate
  user: home
  password: xxxxx

objects:
  track:
    - person
    - cat
  filters:
    person:
      threshold: 0.8
      min_area: 50000
    cat:
      threshold: 0.8
      min_area: 5000
      max_area: 30000

ffmpeg:
  hwaccel_args:
    - -hwaccel
    - vaapi
    - -hwaccel_device
    - /dev/dri/renderD128
    - -hwaccel_output_format
    - yuv420p

cameras:
  front:
    ffmpeg:
      input: rtsp://xxxxx@192.168.1.41:554/videoMain
    save_clips:
      enabled: True
      pre_capture: 5      
    input_args:
        - -avoid_negative_ts
        - make_zero
        - -fflags
        - nobuffer
        - -flags
        - low_delay
        - -strict
        - experimental
        - -fflags
        - +genpts+discardcorrupt
        - -r
        - '10'
        - -rtsp_transport
        - tcp
        - -stimeout
        - '5000000'
        - -use_wallclock_as_timestamps
        - '1'

All other cameras have the same options, but with different FPS settings.

blakeblackshear commented 4 years ago

Can you post the ffprobe output for that camera? I would also be curious if the files in the cache folder for save_clips have a high FPS. Also, try setting the ffmpeg log level to info for that camera.

jonnyrider commented 4 years ago

Here is the ffprobe output for a camera that has high FPS (should be 10, but is coming back at over 80_:

On connect called
ffprobe -v panic -show_error -show_streams -of json "rtsp://xxxxx@192.168.1.65:554/videoMain"
Starting detection process: 25
{'streams': [{'index': 0, 'codec_name': 'h264', 'codec_long_name': 'H.264 / AVC / MPEG-4 AVC / MPEG-4 part 10', 'profile': 'Main', 'codec_type': 'video', 'codec_time_base': '0/2', 'codec_tag_string': '[0][0][0][0]', 'codec_tag': '0x0000', 'width': 1280, 'height': 720, 'coded_width': 1280, 'coded_height': 720, 'has_b_frames': 0, 'sample_aspect_ratio': '0:1', 'display_aspect_ratio': '0:1', 'pix_fmt': 'yuvj420p', 'level': 31, 'color_range': 'pc', 'color_space': 'bt709', 'color_transfer': 'bt709', 'color_primaries': 'bt709', 'chroma_location': 'left', 'field_order': 'progressive', 'refs': 1, 'is_avc': 'false', 'nal_length_size': '0', 'r_frame_rate': '119/12', 'avg_frame_rate': '0/0', 'time_base': '1/90000', 'start_pts': 725608927, 'start_time': '8062.321411', 'bits_per_raw_sample': '8', 'disposition': {'default': 0, 'dub': 0, 'original': 0, 'comment': 0, 'lyrics': 0, 'karaoke': 0, 'forced': 0, 'hearing_impaired': 0, 'visual_impaired': 0, 'clean_effects': 0, 'attached_pic': 0, 'timed_thumbnails': 0}}, {'index': 1, 'codec_name': 'pcm_mulaw', 'codec_long_name': 'PCM mu-law / G.711 mu-law', 'codec_type': 'audio', 'codec_time_base': '1/8000', 'codec_tag_string': '[0][0][0][0]', 'codec_tag': '0x0000', 'sample_fmt': 's16', 'sample_rate': '8000', 'channels': 1, 'bits_per_sample': 8, 'r_frame_rate': '0/0', 'avg_frame_rate': '0/0', 'time_base': '1/8000', 'start_pts': 4294967296, 'start_time': '536870.912000', 'bit_rate': '64000', 'disposition': {'default': 0, 'dub': 0, 'original': 0, 'comment': 0, 'lyrics': 0, 'karaoke': 0, 'forced': 0, 'hearing_impaired': 0, 'visual_impaired': 0, 'clean_effects': 0, 'attached_pic': 0, 'timed_thumbnails': 0}}]}
Creating ffmpeg process...
ffmpeg -hide_banner -loglevel panic -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format yuv420p -avoid_negative_ts make_zero -fflags nobuffer -flags low_delay -strict experimental -fflags +genpts+discardcorrupt -rtsp_transport tcp -stimeout 5000000 -use_wallclock_as_timestamps 1 -i rtsp://xxxxx@192.168.1.65:554/videoMain -f segment -segment_time 10 -segment_format mp4 -reset_timestamps 1 -strftime 1 -c copy -an -map 0 /cache/back-%Y%m%d%H%M%S.mp4 -f rawvideo -pix_fmt rgb24 pipe:
Camera_process started for back: 38
Starting process for back: 38
 * Serving Flask app "detect_objects" (lazy loading)
 * Environment: production
   WARNING: This is a development server. Do not use it in a production deployment.
   Use a production WSGI server instead.
 * Debug mode: off

And this is the config:


cameras:
  back:
    ffmpeg:
      input: rtsp://xxxx@192.168.1.65:554/videoMain
    save_clips:
      enabled: True
      pre_capture: 5      
    global_args:
        - -hide_banner
        - -loglevel
        - info
    input_args:
        - -avoid_negative_ts
        - make_zero
        - -fflags
        - nobuffer
        - -flags
        - low_delay
        - -strict
        - experimental
        - -fflags
        - +genpts+discardcorrupt
        - -r
        - '10'
        - -rtsp_transport
        - tcp
        - -stimeout
        - '5000000'
        - -use_wallclock_as_timestamps
        - '1'

Can you please tell me where the cache is for the saved files?

blakeblackshear commented 4 years ago

It's in /cache in the container. You can mount a volume there so you can grab them.

jonnyrider commented 4 years ago

Very interesting/odd. The clips look fine, no issues with the FPS. After about 5 minutes the CPU usage has gone back down to 35% and the FPS on all cameras seems to have stabilised at the right level!

EDIT: Think that was just luck, I've stopped and restarted the container and one camera is over 100 FPS while the rest are all working fine. Same config as above!

mario-tux commented 4 years ago

For me, on 5 same cameras (Anpviz), the fps detection was unreliable (15 or 30 fps instead of 5) on just one camera. The only difference was that the buggy one had always more motions (windy scene). Switching on CBR (instead of VBR) looks to fix the problem. The -r option of ffmpeg didn't help.

jonnyrider commented 4 years ago

I've restarted the container a few times with different settings on the cameras, but it seems very haphazard with the FPS. Sometimes all cameras show really high FPS, sometimes only one or two. Could be the cheap Foscam cameras I have, but they worked perfectly before dropping the vsync argument.

Happy to help further debugging if I can.

blakeblackshear commented 4 years ago

@jonnyrider can you try adding -r 10 to the beginning of the output args instead of the input args? Should look like this:

  output_args:
    - -r
    - '10'
    - -f
    - rawvideo
    - -pix_fmt
    - rgb24

jonnyrider commented 4 years ago

Working fine so far, I'll keep an eye on it and report back after it's been going for a few hours to ensure it stays good!

Thanks

blakeblackshear commented 4 years ago

I was able to reproduce this by setting up an MJPEG stream from one of my cameras. I added an optional fps option to the config that sets the output fps for ffmpeg. It tells ffmpeg to either drop or duplicate frames to try and maintain that frame rate, so its not as ideal as using -vsync drop. If you want to save clips, you can't use -vsync drop, so this gives you options.

jonnyrider commented 4 years ago

Been going for a few hours now and all working perfectly, including savings clips. 👍

dejanzelic commented 4 years ago

Hate to bring this up again right after you closed it, but I'm having this issue too with my Amcrest doorbell and floodlight cameras. Occasionally the FPS jumps to over 100 FPS. I've tried the config items here and that didn't fix it. Here is my config:

web_port: 5000

mqtt:
  host: mqtt
  topic_prefix: frigate
  client_id: frigate
  user: service
  password: <REDACTED>

objects:
  track:
    - person
    - car
    - truck
  filters:
    person:
      threshold: 0.80
      min_area: 300

zones:
  front_steps:
    garage:
      coordinates:
        - 0,0
        - 250,250
        - 0,250
cameras:
  side-yard:
    ffmpeg:
      input: rtsp://<REDACTED>:<REDACTED>@sidecamera.int:554/cam/realmonitor?channel=1&subtype=1
    take_frame: 3
    save_clips:
      enabled: False
      pre_capture: 30

  front-door:
    ffmpeg:
      input: rtsp://<REDACTED>:<REDACTED>@doorcamera.int:554/cam/realmonitor?channel=1&subtype=1
    take_frame: 3
    save_clips:
      enabled: False
      pre_capture: 30
  garage:
    ffmpeg:
      input: rtsp://<REDACTED>:<REDACTED>@garagecamera.int:554/cam/realmonitor?channel=1&subtype=1
    take_frame: 3
    save_clips:
      enabled: False
      pre_capture: 10

  back-yard:
    ffmpeg:
      input: rtsp://<REDACTED>:<REDACTED>@backyardcamera.int/live
    take_frame: 6
    save_clips:
      enabled: False
      pre_capture: 30
    snapshots:
      show_timestamp: False
      draw_zones: True

I tried to add:

    input_args:
        - -avoid_negative_ts
        - make_zero
        - -fflags
        - nobuffer
        - -flags
        - low_delay
        - -strict
        - experimental
        - -fflags
        - +genpts+discardcorrupt
        - -r
        - '10'
        - -rtsp_transport
        - tcp
        - -stimeout
        - '5000000'
        - -use_wallclock_as_timestamps
        - '1'

This was added to each camera with no luck.

Even though I'm not saving clips yet, I also tried:

  output_args:
    - -r
    - '10'
    - -f
    - rawvideo
    - -pix_fmt
    - rgb24

Neither of these options worked for me.

I, too, have had this be an intermittent problem as you can see from this graph:

Rolling back to 0.5.2 "fixes" the issue.

Any suggestions?

blakeblackshear commented 4 years ago

Remove the changes you made to your config and see the example config for the new fps option.

dejanzelic commented 4 years ago

That worked! Thanks!

blakeblackshear / frigate

5x higher CPU usage moving from stable to 0.6.0-rc1 docker #176