WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.33k stars 4.2k forks source link

Suggestion to achieve 100% gpu usage in a multicamera system #1758

Open aimaicai opened 1 year ago

aimaicai commented 1 year ago

Hi, i'm quite new to python programming and i have to face the problem of maximizing the detection with yolo 7 in a multicamera environment. For now I simulate N cameras with N videos and I have an nvidia rtx 3060 card available and I use the standard model of yolo7 as a reference. I was wondering if you had any advice for me. Multithreading is easy to use, but it doesn't perform well if I use the detector exclusively (60 fps). If, on the other hand, I allow all threads to be invoked without any locks, the performance improves (100 fps), but the cpu consumption also greatly increases (almost 70-80%), while I would expect a much lower consumption since most of the process should run on gpu. Multiprocessing is much more difficult: I tried keeping the detector in a separate process and feeding it with images on a queue, but the performance is low (70 fps). To have better performance I had to run the detection on more processes, but in this way I also had to replicate the loading of the same model more times, increasing the consumption of system ram and also on the video card. Again the cpu usage seems excessive to me and grows with the number of processes. With this system I got 100% gpu usage and 140 fps detection, but I practically saturated all the resources, while my goal is to saturate only the video card (and obviously some cpu cores). I was also considering batch inference, but couldn't find any references for this project. Thank you

vtyw commented 1 year ago

@aimaicai What is your end goal, to get the highest total throughput in your application? If so, reaching 100% GPU utilisation is the wrong thing to focus on and doesn't have much to do with total performance.

If you're running yolov7 on say a 1080p video, a significant amount of the CPU usage is due to OpenCV reading the image. Depending on the bitrate, etc of the video, the GPU inference time might not be a bottleneck at all. In this case, your overall processing framerate can be improved by things such as: preprocessing, hardware-accelerated fetching of frames, choosing smaller input resolutions and the right codec, multiprocessing.