Open kmsravindra opened 4 years ago
Try to compile Darknet as SO/DLL - library and run https://github.com/AlexeyAB/darknet#how-to-compile-on-linux-using-make
LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov4.cfg yolov4.weights test.mp4
It is implemented for the lowest latency.
Also compile OpenCV with GStreamer, it will reduce latency more.
The first 2 points give the largest delay.
Thanks for your response @AlexeyAB. I am already using openCV compiled with GStreamer. Will have to try the SO/DLL option. Regarding implementing video capture, resizing, nms etc., on GPU, I think deepstream SDK is providing plugins to exactly do the same but want to check if there is an alternate option using openCV or something else to achieve the same. Are there any references / pointers that you are aware of as to how those could be implemented using opencv directly on CUDA?
Hi @AlexeyAB ,
With reference to your note, could you please explain how running the .so file will result in lower latency compared to invoking the ./darknet demo command? Assuming inference time is the same (around 20ms to 25ms per image), will this shared object run videocapture, image preprocessing / postprocessing more efficiently? The overall latency currently observed is around 80ms to 120ms (from capture to display).
LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov4.cfg yolov4.weights test.mp4
It is implemented for the lowest latency.
What min/max/avg latency do you get
for ./darknet detector demo ... ? - there are synced all 3 threads for each frame, so latency = max_lat*3
for ./uselib data/coco.names ... ? - there arte all 3 threads work async, so latecny = lat1+lat2+lat3
<= max_lat*3
Try to set false
there, recompile and measure latency: https://github.com/AlexeyAB/darknet/blob/master/src/yolo_console_dll.cpp#L297
Hi @AlexeyAB
I have couple of observations that I wanted to share with you and get your opinion -
Same gpu RTX2080Ti GPU and everything else including config etc., is all exactly the same on both the machines. The end to end latency of I7-8700 CPU higher by 60ms on an average than I7-9700 CPU when 60fps input is fed.
fyi.. The latency measurement is end to end ( Latency from Capture TO Display). Measured using camera pointing to a youtube clock video and taking slowmo snapshot with youtube clock video and darknet display video adjacent to each other while darknet is inferring on the youtube clock video. The difference between the timestamps is how the latency measured.
It would be helpful to understand your comments as to how a difference in CPU (Intel I7-8700) have such a significant impact on latency compared to Intel I7-9700 CPU?
Also, do you think more cpu compute is engaged in processing higher number of frames in unit second which is pushing up the latency when input fps is higher?
Are there any image buffers / pre (or post) processing steps that depend a lot on CPU compute that could have resulted in this variation in latency? Could that be happening from the opencv image management during capture / preprocessing / postprocessing?