Open endo123 opened 6 years ago
I think GTX 1080Ti should be enough to process FullHD (1920x1080) 50 FPS by using yolov3.cfg (416x416).
Because it can processes 2 x 1920x1080 25 FPS: https://github.com/AlexeyAB/darknet/issues/1232#issuecomment-405565193
On another thread, I noticed a comparison these 2 GPUs but for multiple live streams, so some parallelism was also coming into play in that case.
Single Yolo model can occupy ~95% of GPU - if you use this repository, OPENCV=1 CUDNN=1, and modern CPU. There can be a bottleneck only on a CPU-side (video decompressing, resizing, saving) if you use other repo or slow CPU.
Titan V is required if you want to achive about ~90 FPS on 1920x1080 video and 416x416 network size.
Thanks, @AlexeyAB !
How do the performance requirements vary w/ increase in network size? We may need a bigger network size to allow for better smaller object detection in our dataset.
P.S: @kmsravindra
How do the performance requirements vary w/ increase in network size?
performance requirements linearly proportional to the product of numbers network_width x network_height
@Alexeyab, Just to confirm, we plan to use 832 x 480 network size whose product is 2.3 times bigger than 416x416. So can we approx assume 50/2.3 = 21.7Fps for this network size for 1080Ti?
Also, from your other thread I am assuming yolov2-light - yolov3 would be 1.3 times faster @ 1% mAP trade-off. So, hence using this lighter yolov3 should pump it up to 21.7 *1.3 = approx 28 FPS?
@kmsravindra In general yes.
So can we approx assume 50/2.3 = 21.7Fps for this network size for 1080Ti?
Yes. But this is only the assumption, that GTX 1080Ti will have about 50 FPS on yolov3 416x416, since I didn't test it on GTX 1080Ti.
From the other hand, I got only 32 FPS
on Tesla V100 (~Titan V) without Tensor Cores, and 90 FPS
on Tesla V100 (Titan V) with Tensor Cores, so may be there is somewhere a bottleneck on GPU, so GPU usage can be less than 90% without Tensor Cores: https://github.com/AlexeyAB/darknet/issues/407
Also, from your other thread I am assuming yolov2-light - yolov3 would be 1.3 times faster @ 1% mAP trade-off. So, hence using this lighter yolov3 should pump it up to 21.7 *1.3 = approx 28 FPS?
To do this, you should use -quantized
flag at the end of command, and you should use this input_callibration=
param in your cfg-file: https://github.com/AlexeyAB/yolo2_light/blob/29905072f194ee86fdeed6ff2d12fed818712411/bin/yolov3.cfg#L25
Thanks for the info @AlexeyAB
We intend to do some real-time inference on a single HD resolution video stream at around 50fps. I am trying to spec. out GPUs for the system deployed onsite that will be used to do the inference and re-rendering of video with detected objects.
Would GTX 1080Ti be able to keep up or would I need to pick Titan V? Degradation of the output frame rate to 30 fps or so might be acceptable as this is a prototype to be used for demonstrations.
On another thread, I noticed a comparison these 2 GPUs but for multiple live streams, so some parallelism was also coming into play in that case.
Any recommendations/pointers are appreciated.
Best, Vineet
P.S: Copying @kmsravindra as he is collaborating w/ me on this project.