blakeblackshear / frigate

NVR with realtime local object detection for IP cameras
https://frigate.video
MIT License
19.2k stars 1.76k forks source link

[Optimization] Run object detection against multiple regions from the same frame in parallel #6924

Closed ccutrer closed 1 year ago

ccutrer commented 1 year ago

I've been looking into the motion-to-object detection pipeline extensively (from the perspective of process structure, and less so on the actual calculations) for many-camera installations. If you have multiple object detectors (i.e., multiple Corals, multiple GPUs, etc.), you want to be able to supply regions to the object detector concurrently. This works just fine with multiple cameras, since there's a single queue that all cameras dump into, and all detector processes pull from. But I noticed that when motion happens on a camera (especially an outdoor camera), it's very common that multiple regions are detected, and then sent to the object detector. BUT, within a single frame, this is all done serially and synchronously - it waits for a response from the detector for one region before sending the next region. It would be nice all regions could get dumped on the queue at once, and then once all regions have had object detection run, it resumes. As it is now, the more motion regions in a single frame, it linearly increases the frame processing time, and can easily cause a single camera to fall quite a bit behind and start to drop frames since several subsequent frames will likely have the same amount of motion. I'm routinely seeing times > 1s just to run 3 object detections for a single frame, even though I have 2 Corals, and my overall detection_fps is ~30, and my inference speed on my Corals is ~50ms each. a23317faefdea25462be725fd8ad3eba61753019 is my additional logging, and I'm getting stuff like this:

2023-06-26 17:44:43.931919662  [2023-06-26 17:44:43] frigate.video                  INFO    : It took 1.3198727529961616s (0.06454933500000237s CPU time) to run 3 object detections on camera pantry
2023-06-26 17:44:44.068203296  [2023-06-26 17:44:44] frigate.video                  INFO    : It took 1.4172799719963223s (0.06937015400000135s CPU time) to run 3 object detections on camera john_deere
2023-06-26 17:44:44.098746785  [2023-06-26 17:44:44] frigate.video                  INFO    : It took 1.4359929300262593s (0.06593030700000213s CPU time) to run 3 object detections on camera great

Sometimes it's as low as 0.4s, sometimes as high as 2s, but mostly clustered around 0.9-1s. If it's 4 or 5 object detections only seems to marginally increase the time.

ccutrer commented 1 year ago

Note that when I removed ~2/3 of my cameras from the config, I end up with lines like It took 0.09277486003702506s (0.13120160999999975s CPU time) to run 6 object detections on camera garage. MUCH faster. The inference speed is also faster (<10ms vs. 40-90ms). So there's an argument to be made that if the system isn't overloaded, it doesn't matter much about parallelizing single camera object detections - either because few people will have multiple corals, or if you really need multiple corals, that means you'll have multiple cameras keeping them plenty busy and parallelizing for a single camera won't actually get you more throughput on the object detector.

ccutrer commented 1 year ago

Now that I have #6940 applied locally, and I'm not CPU constrained, I'm seeing detect frame rates of high teens on some cameras with lots of people here. I could imagine even higher if someone is monitoring a public space (either inside or outside). So it seems like this might be more worthwhile.

(I'm also up to 😲 4 Corals at the moment! My PCIe adapter for the dual core M2 Coral showed up, and I haven't bothered removing my USB Corals yet).

ccutrer commented 1 year ago

I'm definitely seeing a problem today. It's very windy, and my outdoor cameras are picking up a lot of motion. As much as 45 detections-per-second per camera (running at 6fps on the detect stream). I've upgraded my CPU, so I'm definitely not CPU bound anymore (50-60% overall idle), and inference times seem to be reasonable on my Corals (6-25ms). Yet I'm still dropping frames on the busy cameras. It's either the fact that object detections are done sequentially as documented in this ticket, or too much single-threaded CPU usage calculating regions from motion or tracking objects (documented in #7000).

I'm also concerned that the decide-regions-from-motion is being too conservative. Is there any documentation on how that process works, or any way to tweak it (i.e. merge regions of more than 25% of either one overlaps?). For example, in this screenshot

Screenshot 2023-07-10 at 10 27 03 AM

there are 12 distinct regions being sent to object detection, with many of them being proper subsets of a larger region, and others (the bottom left corner in particular) having significant overlap that could be sent together. I know object detection works on fixed size images, so that partially explains why Frigate may prefer to send multiple semi-overlapping regions rather than combining, but perhaps if there are more than 3 or 4 overlapping regions, it would be worth taking a larger chunk and then resizing?

I've also thought about doing more aggressive motion masks, but a good chunk of the detected motion is on the edges of bushes and trees that are swaying significantly, and on calm days what are currently "edges" would be clearly in areas that people walk, and I want to capture.

ccutrer commented 1 year ago

Lol, I just caught this camera doing ~65 detections per second:

Screenshot 2023-07-10 at 3 44 17 PM
homeassistant7 commented 1 year ago

Have you considered motion masks over your trees?

If you're careful with the placement you'll still pickup the motion of people walking alongside the trees without always triggering motion.

Blowing shadows on the other hand are hard to handle as the have a tendency to move with the sun! Best defence against that I found was well maintained and shaped hedging

blakeblackshear commented 1 year ago

You could use some motion masks to reduce the amount of motion. Specifically, the lower part of the trees by the pool and the top middle of the bush off the front porch.

I have spent a lot of time working with the code that combines regions based on motion, and I don't think there is a way to get it to work well for all scenarios. The region can only be so large before smaller objects will be missed.

I think the best approach is getting more intelligent about how large objects are anticipated to be based on where they are in the frame. This would allow regions to be sized based on the anticipated size of the object rather than motion area alone.

blakeblackshear commented 1 year ago

It's also ok for frames to be occasionally skipped at peak times.

homeassistant7 commented 1 year ago

I think the best approach is getting more intelligent about how large objects are anticipated to be based on where they are in the frame. This would allow regions to be sized based on the anticipated size of the object rather than motion area alone.

Is it possible to do size filtering based on zone? I'd love this for my front yard where things should be big if they're in the yard zone but smaller is okay across the street.

NickM-27 commented 1 year ago

I think the best approach is getting more intelligent about how large objects are anticipated to be based on where they are in the frame. This would allow regions to be sized based on the anticipated size of the object rather than motion area alone.

Is it possible to do size filtering based on zone? I'd love this for my front yard where things should be big if they're in the yard zone but smaller is okay across the street.

Yes it's possible, zones support filters

homeassistant7 commented 1 year ago

Amazing thank you! No more bird people on my lawn 😂

ccutrer commented 1 year ago

I've considered motion masks (and actually do have some on on the backyard camera; I forgot to enable motion masks before taking the screenshot), but they often end up blocking where people could actually be (even if they're not often in those locations). Either the kids running through the yard in the back, or someone doing yard work. On the porch camera, that tree is almost completely transparent in winter, and I definitely do want to know ASAP as people approach the porch.

Yes it's possible, zones support filters

Keeping in mind this is done at the object level, so motion will still be detected in those zones, and object detection run, and then if it comes back as an object it is just ignored. So it accomplishes the goal of eliminating false positives in the final events, but don't do much to lower the processing load to arrive at that conclusion, which may or may not be an issue depending on how many cameras and resources your Frigate machine has. My goal here is to push the project to be as resource efficient as possible, so that any given setup can either have as many cameras as possible, or spend as little on computing power as necessary for the cameras they have. Myself, I have a lot of cameras (for a residence), but I've also spent a decent amount on compute power too. See next point :).

It's also ok for frames to be occasionally skipped at peak times.

👍. Yeah, I'm not overly concerned, as long as recordings are still happening, and events in general are reliable "enough" (which they are, now that I've upgraded my CPU). See prior point :). I think I'd worry most about my "street" camera where I want to know every car that passes, and sometimes they're fast so may only be on a couple of frames. But that camera is less susceptible to "too many motion regions" both because there are fewer bushes/trees in the foreground, and has a significant amount of masked motion in the background (sky, other houses I don't care about). I also run its detect stream at a higher frame rate (12fps).

I have spent a lot of time working with the code that combines regions based on motion, and I don't think there is a way to get it to work well for all scenarios. The region can only be so large before smaller objects will be missed.

👍 If you're confident there's not much to be gained there, we can shelve this tangent on the issue, and leave it as I had in the OP - simply that occasionally there will be a lot of motion regions in a single frame, and it might be nice to have object detection run in parallel on those for people with multiple detection backends.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.