AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.68k stars 7.96k forks source link

Understanding video resolution #6412

Open xaerincl opened 4 years ago

xaerincl commented 4 years ago

So, I successfully trained a tiny yolov4 and it is working just fine. After training i tested on a pretty big video and the results were amazing but i have a question about the resolutions.

My model was trained for input 608x608 but as far as i can see, what 'darknet_video.py' does is: Sets the width and height to 1280x720 (my videos are bigger btw). And for every frame, it resizes the frame to the width and height of the net (i asume 608x608 in my case)... nevertheless my output video it still has the same dimension as my input vid which is 2688x1520. So there is something there that im not getting, i want to keep using the model with opencv dnn module and i want to make sure that im doing everything right with the resizes and the input from blobFromImage.

    cap = cv2.VideoCapture("test.mp4")
    cap.set(3, 1280)
    cap.set(4, 720)
    out = cv2.VideoWriter(
        "output.avi", cv2.VideoWriter_fourcc(*"MJPG"), 10.0,
        (darknet.network_width(netMain), darknet.network_height(netMain)))
    print("Starting the YOLO loop...")

    # Create an image we reuse for each detect
    darknet_image = darknet.make_image(darknet.network_width(netMain),
                                    darknet.network_height(netMain),3)
    while True:
        prev_time = time.time()
        ret, frame_read = cap.read()
        frame_rgb = cv2.cvtColor(frame_read, cv2.COLOR_BGR2RGB)
        frame_resized = cv2.resize(frame_rgb,
                                   (darknet.network_width(netMain),
                                    darknet.network_height(netMain)),
                                   interpolation=cv2.INTER_LINEAR)
folkien commented 3 years ago

Good point!

I have some doubts if rescaling image is needed/recommended. I've made some tests.

With prior rescaling to 512x512 : I've tested YOLOv4 512x512 on video 1920x1080. FPS was about 55FPS but small objects weren't detected. Because image was resampled down to smaller 512x512 size only bigger/closer objects were detected. For me this is inacceptable. :cry:

Without rescaling input frames (darknet image is equal input image): I've tested YOLOv4 512x512 on video 1920x1080. FPS was about 30FPS. In this approach both small and bigger objects were detected. How it is done if YOLOv4 is smaller? Is it YOLO checking image few times? @AlexeyAB

Which approach should be used and it's better? @AlexeyAB

stephanecharette commented 3 years ago

Also see https://www.ccoderun.ca/darkhelp/api/Tiling.html and https://www.ccoderun.ca/darkmark/ImageSize.html. Your images and video frames will ALWAYS be resized to match your network dimensions. If you don't do it prior to calling Darknet, then Darknet does it itself when you call it.

Doesn't matter if your video frames are 9999x9999. If your network is 512x512, then Darknet only knows how to work with 512x512 images.

...unless you are using DarkHelp and you turn on image tiling.

folkien commented 3 years ago

Also see https://www.ccoderun.ca/darkhelp/api/Tiling.html and https://www.ccoderun.ca/darkmark/ImageSize.html. Your images and video frames will ALWAYS be resized to match your network dimensions. If you don't do it prior to calling Darknet, then Darknet does it itself when you call it.

Doesn't matter if your video frames are 9999x9999. If your network is 512x512, then Darknet only knows how to work with 512x512 images.

...unless you are using DarkHelp and you turn on image tiling.

If you are right then something is wrong with my code, because prior resizeing results in fewer detections (only bigger objects). Hmm... :thinking:

@stephanecharette Anyway, which approach is better (more FPS), prior resizing(in video reading thread) or leave it to darknet?

stephanecharette commented 3 years ago

@stephanecharette Anyway, which approach is better (more FPS), prior resizing(in video reading thread) or leave it to darknet?

Why would it matter? Do you think you can resize an image better/faster than OpenCV?

stephanecharette commented 3 years ago

If you are right [...]

Thanks for the vote of confidence...? Note I'm the author of both DarkHelp and DarkMark. If you want to know more about resizing images and videos for use with Darknet, please see my Darknet youtube tutorials.

folkien commented 3 years ago

If you are right [...]

Thanks for the vote of confidence...? Note I'm the author of both DarkHelp and DarkMark. If you want to know more about resizing images and videos for use with Darknet, please see my Darknet youtube tutorials.

Thanks, I will check this. BTW: I've written something similiar to you ( https://github.com/folkien/yolo-annotate ) and also image augumentation tool (https://github.com/folkien/pyImageAugmentation).

folkien commented 3 years ago

@stephanecharette Anyway, which approach is better (more FPS), prior resizing(in video reading thread) or leave it to darknet?

Why would it matter? Do you think you can resize an image better/faster than OpenCV?

No, my question was : Is it better to prior rescale image by opencv (for example darknet_video.py) or give full image directly to darknet which also will resize image?

stephanecharette commented 3 years ago

No, my question was : Is it better to prior rescale image by opencv (for example darknet_video.py) or give full image directly to darknet which also will resize image?

My answer remains the same.

Option #1: read cv::mat, resize mat, call darnet, does prediction Option #2: read cv::mat, call darnet, resize mat, does prediction

The only possible difference is how darknet internally manages the image because it also needs to convert it to a proprietary image format which is a long vector of floats.