near life stream view independent from input frame rate

UweW commented 4 years ago

Hi, I will wonder that there is a real life view functionality independent from the input frame rate.

What happens when I try to a webcam's rstp stream as input for darknet is that the input stream is bufferd and every frame is processed by darknet. So far so good. But if you want to use the output view as a life view, you run under circumstances in problems. If the input frame rate is less than the prediction frame rate everything is fine. But when you run on a not so fast machine like the Jetson Nano the yolo3-tiny frame rate is about 14 fps but the webcam delivers 25 fps. Means the view runs behind and the gap growing over the time. For accurate prediction in my case I had to switch to yolo3 and fps going down to 2 fps. The issue is that every input frame is buffered a nd processed (first in, first out). Cool will be an option that only the last frame will be processed by darknet and the other bufferd frames will be deleted from the cache. That will give you the option to run an live output stream with the fps from darknet. Not in any case you can reduce the webcam fps to a value less than the darknet.

This will be very helpfull on low processing Hardware.

best regards Uwe

procule commented 4 years ago

I encountered a similar problem while implementing darknet in a homemade ffmpeg filter. When the buffer is filled and the darknet library is not keeping up, performance degrades rapidly. So, you want your buffer to stay unfilled. The way I did it without loosing accuracy was to keep the last few predicted frames bounding boxes in memory, average them and then only pass one frame every X frames through darknet. On a 30 fps (33.3 ms/frame) stream, having the same bboxes for two frames in a row is almost not noticeable and you get twice the throughput if the predictions from darknet is your bottleneck. It made a big difference from my tests. Example: a 4K sample @ 60 fps (native) went from 12-14 fps to about realtime and you could not notice that every other frame was using the same bboxes than the last one (~16ms).

LazyG commented 1 year ago

@procule, do you happen to have code you could point to where you've done what you described? I'm preparing for a project where I'd like to stream video using the rtsp protocol from an ip camera, and would love to reference it if visible somewhere. Thank you.

procule commented 1 year ago

@procule, do you happen to have code you could point to where you've done what you described? I'm preparing for a project where I'd like to stream video using the rtsp protocol from an ip camera, and would love to reference it if visible somewhere. Thank you.

Hi @LazyG, Since that comment, I tuned up my code to use CUDA functions to pre-process the frames in the GPU for Darknet "format" (if I remember correctly, Darknet expects an RGB array of normalized values for pixels' channels (i.e. 0.0 to 1.0 instead of 0 to 255) and network input size (e.g. 608x608x3) and passing that already-in-GPU frame to the prediction network. Then from the predictions returned, instead of using Darknet's API function, I again use CUDA functions to draw the boxes.

That made the whole frame processing even faster. However, that was almost three years ago with YoloV4. I guess it's faster by default now since I see .cu files in the master branch that was not there "back then".

What is very important is to be sure to process your RTSP frame format input to a Darknet compatible format in the GPU. In my case, I was receiving YUV420 frames so I had to convert them, resize them and pass them to Darknet's network.

As for the "frame detection skipping" I was talking, it's fairly easy. You define a "skip/throttle" value and only run the detection on those frames, while keeping the last detections in memory (here in the ctx. s being a state structure/object):

    if (!((s->frames_nb-1) % s->throttle)) {
        cudaStreamSynchronize(s->cu_darknet_img);
        predict_boxes(ctx);
        }

    cudaLaunchHostFunc(s->cu_darknet_img, draw_rect, ctx);

LazyG commented 1 year ago

@procule, thank you for this. I appreciate it!

AlexeyAB / darknet

near life stream view independent from input frame rate #6258