Open ccutrer opened 1 year ago
to be clear, having corals does not mean the object detection is not using CPU, the regions still have to be resized to the model size and depending on what model you are running it may also need to have the pixel format changed as well.
Right. Indeed. And I'm not saying they're has to be optimizations here. It could be as good as it's going to get. I've just been impressed with how much @blakeblackshear improved the motion detector, and was hoping he could at least take a look at the other parts of process_frames after motion detection, but before the IPC to the object detector (but also including the create_tensor_input) for similar optimizations.
Full flame graph attached. Seems like a significant chunk is in match_and_update in norfair_tracker.py, particularly frigate_distance function.
I was just wanting to clarify that the assumption coral == no cpu usage for object detection isn't necessarily accurate, lots of motion will potentially lead to lots of regions and that can be part of the equation.
On dev as of this morning, CPU usage is much better for idle cameras. What surprises me is that on cameras with even a small amount of motion how much CPU usage goes up. It's not object detection - I have 4x corals, and they're all staying under 20ms inference speed (usually under 10ms). So my guess is it's either the object tracking code, or the motion region merging and prep-to-send-to-object-detection code. Given the recent mentions of avoid casts in numpy, I'm curious if similar optimizations are to be had in this section of the pipeline. See this
py-spy top
capture of a semi-active camera, run over 30-60s:Besides the obligatory wait at the top, there are several methods using a significant chunk of CPU above improved_motion.py that look like internal numpy cast stuff.