Suggestion to achieve 100% gpu usage in a multicamera system

Hi, i'm quite new to python programming and i have to face the problem of maximizing the detection with yolo 7 in a multicamera environment. For now I simulate N cameras with N videos and I have an nvidia rtx 3060 card available and I use the standard model of yolo7 as a reference. I was wondering if you had any advice for me. Multithreading is easy to use, but it doesn't perform well if I use the detector exclusively (60 fps). If, on the other hand, I allow all threads to be invoked without any locks, the performance improves (100 fps), but the cpu consumption also greatly increases (almost 70-80%), while I would expect a much lower consumption since most of the process should run on gpu. Multiprocessing is much more difficult: I tried keeping the detector in a separate process and feeding it with images on a queue, but the performance is low (70 fps). To have better performance I had to run the detection on more processes, but in this way I also had to replicate the loading of the same model more times, increasing the consumption of system ram and also on the video card. Again the cpu usage seems excessive to me and grows with the number of processes. With this system I got 100% gpu usage and 140 fps detection, but I practically saturated all the resources, while my goal is to saturate only the video card (and obviously some cpu cores). I was also considering batch inference, but couldn't find any references for this project. Thank you

WongKinYiu / yolov7

Suggestion to achieve 100% gpu usage in a multicamera system #1758