Image resizing and padding for CNN

AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )

http://pjreddie.com/darknet/

Other

21.71k stars 7.96k forks source link

Image resizing and padding for CNN #3177

Open PythonImageDeveloper opened 5 years ago

PythonImageDeveloper commented 5 years ago

Hi @AlexeyAB I have a question, the right answer is critical for me. I'm working on the important project, I have two projects, one is classification and another is OCR, Suppose we want to solve these problems with Deep learning methods. Accordingly your experiences, resizing the input images without keep aspect-ratio whether in training and inference mode can be achieve good result or add zero padding with keep aspect-ratio? If the answer is resizing, So in the OCR problem, Is it not to keep the aspect-ratio because of skewed of the characters? Suppose the original images have aspect-ratio = 5 and my model has input-size with aspect ratio = 2, In order to keep aspect-ratio, I have to add zero padding to big part of original images. These method cause to the CNN will have to learn that the black part of the image is not relevant? BTW, Accordingly your experience, which method you prefer to achieve the best result.

AlexeyAB commented 5 years ago

@PythonImageDeveloper Hi,

In your case is better to keep aspect ratio, so you should train by using original repo https://github.com/pjreddie/darknet

I will implement flag for training with letter_box later.

In my experience, I usually use models for Detection on real-time video cameras, where are all frames are the same size and apsect ratio. Also usually training dataset is collected from the same video camera, so it isn't required to keep aspect ratio.

PythonImageDeveloper commented 5 years ago

In my experience, I usually use models for Detection on real-time video cameras, where are all frames are the same size and apsect ratio. Also usually training dataset is collected from the same video camera, so it isn't required to keep aspect ratio. , This is right when the training and inference images are same size and aspect-ratio, right? If all training images have with aspect-ratio = x and testing images with aspect-ratio = y , Is it necessary to keep aspect-ratio in the training and the testing phase?

AlexeyAB commented 5 years ago

If all training images have with aspect-ratio = x and testing images with aspect-ratio = y , Is it necessary to keep aspect-ratio in the training and the testing phase?

Then you just can compensate for this using a different network resolution when training and detecting, f.e. if training 640x480 (ar_x = ~1.3) and detection 1920x1080 (ar_y = ~1.77), then you can Train by using width=640 height=480 (ar_x = ~1.3), and then change width=448 height=256 (ar_y = ~1.75) for Detection

The main rules:

ar_x / ar_y of images should be the same as ar_x / ar_y of network resolution.
width= and height= in cfg-file must be multiple of 32

Look4-you commented 5 years ago

@AlexeyAB Hi! If I trainning 640×480, will the network still get the 13×13，26×26,52×52 three Scales of feature maps(FM)?Or other sizes of FM?

AlexeyAB commented 5 years ago

@SHADOWMOOON Hi, Will be 640/32 x 480/32 Just set 640 x 480, run training and look at it )

Look4-you commented 5 years ago

@AlexeyAB Thanks!!!