AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.58k stars 7.95k forks source link

running multiple yolov4-tiny models on the same gpu #7777

Open poornimajd opened 3 years ago

poornimajd commented 3 years ago

Hello, I am trying to run 2 yolov4-tiny models in 2 different terminals,all loaded at the same time.Each model when run individually it runs at 50 fps ,but when I run both the models in separate terminals,at the same time ,they run at only 25fps.The total memory utilization for both the models is only 1GB of the total 6gb available.The gpu utilization is around 70-80% when both the models run.Since the gpu memory is not overloaded,I am not sure why the fps is still dropping to half. Any suggestion is greatly appreciated. Thank you

I use this command to measure fps in both the terminals. ./darknet detector demo ./obj.data ./cfg/yolov4-tiny-obj.cfg ./yolov4-tiny-obj_10000.weights ./test.mp4 -dont_show -ext_output

grandprixgp commented 3 years ago

You get 50 FPS when running one exclusively and 25 FPS on each when running them simultaneously, this checks out. Non issue.

poornimajd commented 3 years ago

@grandprixgp thanks for the reply. But why should the fps reduce when there is enough gpu memory and gpu available? Any solid reason for the above phenomenon? Because some of the issues say that in case you have enough gpu memory and gpu utilization is below 100% we can easily run multiple models in parallel without decrease in fps.

grandprixgp commented 3 years ago

@grandprixgp thanks for the reply. But why should the fps reduce when there is enough gpu memory and gpu available? Any solid reason for the above phenomenon? Because some of the issues say that in case you have enough gpu memory and gpu utilization is below 100% we can easily run multiple models in parallel without decrease in fps.

Because it is not a single mathematical operation that scales linearly, there are bottlenecks present in the hardware. For example it's quite easy for me to write a renderer that spams enough triangles to overload a GPU, despite 99% of the GPUs computational capability being unused at that time.

You might have 20% memory available with two models running f.e.x, but that's a useless metric if the application doesn't need any additional memory. In the same way utilization values can be deceiving because you might be facing something like a power bottleneck (if you use GPU-Z this is referred to as Pwr in the PerfCap row).

If you are using GPU-Z for monitoring, then pay attention to the PerfCap (performance cap) field:

vRel = Reliability. Indicating performance is limited by voltage reliability. VOp = Operating. Indicating performance is limited by max operating voltage(Hardware Limit). Pwr = Power. Indicating performance is limited by total power limit. Thrm = Thermal. Indicating performance is limited by temperature limit. Util = Utilization. Indicating performance is limited by GPU utilization.

Another example, in Cyberpunk 2077 I might be able to achieve 100 FPS at 4K in a certain scene, with DLSS disabled, and my GPU utilization is at 100%, this is great. Now I enable DLSS, which utilizes tensor cores, which are much more power efficient than the raster cores, now my FPS increases to 120 but my overall utilization drops to 80%. If I trust the utilization value then I would believe there is still power left on the table, but in reality I have taken some load away from the raster pipeline by utilizing the tensor cores, skewing the value.

poornimajd commented 3 years ago

@grandprixgp Thanks a lot for the detailed reply.I tried to measure the performance using nvidia-smi.

Just a small clarification,the above explanation also means that memory or gpu utilization are not the only factors controlling the fps of the model right, there can be other limitations and proper profiling tools may help to indicate this right? Correct me if I am wrong.

Also do you suggest any profiling tool (which is at the top of your head or might have used) or any other method to understand what is the bottleneck causing this limitation in the fps for multiple models.A little insight on this would be of great help.

For now I will also try to find what is it that is limiting exactly using GPU-Z.Thanks for this suggestion.