AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.71k stars 7.96k forks source link

increase network-resolution #281

Open anandkoirala opened 6 years ago

anandkoirala commented 6 years ago

Hi @AlexeyAB
to detect small objects i have changed the 'height'and 'width' parameters in the .cfg file (only for detection) and it was working. However, i then wanted to save the model to .h5 file using the .cfg and .weight files but the model input size is not matching the image input height and width from the cfg file as it was changed. Can you explain how the change in the height and width to greater values (multiples of 32) works inside.. i have a .h5 file saved for normal cfg and weight. how would i be able to use this for large images (increase network-resolution0? Many thanks, Anand

AlexeyAB commented 6 years ago

Hi,

Darknet-Yolo reads input network size from .cfg-file and resizes any images to this size: https://github.com/AlexeyAB/darknet/blob/c1904068afc431ca54771e5dc20f2c588e876956/cfg/yolo.cfg#L8-L9 Then Darknet-Yolo do network_forward() and as result get output activation grid with resolution input_width/32, input_height/32. I.e. for input 416x416 the output will be 13x13.

But some software (some implementations of Yolo in Keras, OpenCV, ...) reads intput resolution from the cfg-file, and uses this resolution for network_forward(), but for resizing input image uses hand-binded hardcoded resolution 416x416. So you should manualy change source code in these cases, to resize images to the same resolution as in your .cfg-file instead of 416x416.

anandkoirala commented 6 years ago

@AlexeyAB I used YAD2K to convert darknet .cfg and .weights file into .h5 model. I am using keras to load this .h5 model to do predictions and draw boxes for detection which is working well. The cfg and yolo model have input size 416x416. The thing is that with darknet yolo after training i can change the cfg file with new input size for big images and still be able to use the same weight trained for 416x416.. and works perfectly fine.. I wanted to do the same with the .h5 model.. but if i change cfg file with new input size values and use it with the 416x416 weight file i won't be able to build .h5 using YAD2K because the size would be different in cfg and weights file.. so i would like to understand how darknet is able to do multi resolution detection when the changes made to input size on cfg file although the model was trained for 416x416?

AlexeyAB commented 6 years ago

@anandkoirala Any convolutional neural network and its weights can be used for any input resolution. Precision can be lower if intput resolution for test different very much from input resolution for training, but there will not be any errors.

anandkoirala commented 6 years ago

@AlexeyAB What i am not understanding is.. i trained yolo with 612x512 images. When i use the trained model with 2448x1048 images i get very few detections but if i change the cfg file with multiple of 32 i can detect many objects with same model.. but i think the model would still resize to 416x416.. what is making the difference if it is the same midel. Tiny-yolo-voc.

AlexeyAB commented 6 years ago

@anandkoirala

When i use the trained model with 2448x1048 images i get very few detections


but i think the model would still resize to 416x416.. what is making the difference if it is the same midel. Tiny-yolo-voc.