Open xaerincl opened 4 years ago
Good point!
I have some doubts if rescaling image is needed/recommended. I've made some tests.
With prior rescaling to 512x512 : I've tested YOLOv4 512x512 on video 1920x1080. FPS was about 55FPS but small objects weren't detected. Because image was resampled down to smaller 512x512 size only bigger/closer objects were detected. For me this is inacceptable. :cry:
Without rescaling input frames (darknet image is equal input image): I've tested YOLOv4 512x512 on video 1920x1080. FPS was about 30FPS. In this approach both small and bigger objects were detected. How it is done if YOLOv4 is smaller? Is it YOLO checking image few times? @AlexeyAB
Which approach should be used and it's better? @AlexeyAB
Also see https://www.ccoderun.ca/darkhelp/api/Tiling.html and https://www.ccoderun.ca/darkmark/ImageSize.html. Your images and video frames will ALWAYS be resized to match your network dimensions. If you don't do it prior to calling Darknet, then Darknet does it itself when you call it.
Doesn't matter if your video frames are 9999x9999. If your network is 512x512, then Darknet only knows how to work with 512x512 images.
...unless you are using DarkHelp and you turn on image tiling.
Also see https://www.ccoderun.ca/darkhelp/api/Tiling.html and https://www.ccoderun.ca/darkmark/ImageSize.html. Your images and video frames will ALWAYS be resized to match your network dimensions. If you don't do it prior to calling Darknet, then Darknet does it itself when you call it.
Doesn't matter if your video frames are 9999x9999. If your network is 512x512, then Darknet only knows how to work with 512x512 images.
...unless you are using DarkHelp and you turn on image tiling.
If you are right then something is wrong with my code, because prior resizeing results in fewer detections (only bigger objects). Hmm... :thinking:
@stephanecharette Anyway, which approach is better (more FPS), prior resizing(in video reading thread) or leave it to darknet?
@stephanecharette Anyway, which approach is better (more FPS), prior resizing(in video reading thread) or leave it to darknet?
Why would it matter? Do you think you can resize an image better/faster than OpenCV?
If you are right [...]
Thanks for the vote of confidence...? Note I'm the author of both DarkHelp and DarkMark. If you want to know more about resizing images and videos for use with Darknet, please see my Darknet youtube tutorials.
If you are right [...]
Thanks for the vote of confidence...? Note I'm the author of both DarkHelp and DarkMark. If you want to know more about resizing images and videos for use with Darknet, please see my Darknet youtube tutorials.
Thanks, I will check this. BTW: I've written something similiar to you ( https://github.com/folkien/yolo-annotate ) and also image augumentation tool (https://github.com/folkien/pyImageAugmentation).
@stephanecharette Anyway, which approach is better (more FPS), prior resizing(in video reading thread) or leave it to darknet?
Why would it matter? Do you think you can resize an image better/faster than OpenCV?
No, my question was : Is it better to prior rescale image by opencv (for example darknet_video.py
) or give full image directly to darknet
which also will resize image?
No, my question was : Is it better to prior rescale image by opencv (for example
darknet_video.py
) or give full image directly todarknet
which also will resize image?
My answer remains the same.
Option #1: read cv::mat, resize mat, call darnet, does prediction Option #2: read cv::mat, call darnet, resize mat, does prediction
The only possible difference is how darknet internally manages the image because it also needs to convert it to a proprietary image format which is a long vector of floats.
So, I successfully trained a tiny yolov4 and it is working just fine. After training i tested on a pretty big video and the results were amazing but i have a question about the resolutions.
My model was trained for input 608x608 but as far as i can see, what 'darknet_video.py' does is: Sets the width and height to 1280x720 (my videos are bigger btw). And for every frame, it resizes the frame to the width and height of the net (i asume 608x608 in my case)... nevertheless my output video it still has the same dimension as my input vid which is 2688x1520. So there is something there that im not getting, i want to keep using the model with opencv dnn module and i want to make sure that im doing everything right with the resizes and the input from blobFromImage.