AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.65k stars 7.96k forks source link

Change input size in python #6873

Closed VolkovAK closed 3 years ago

VolkovAK commented 3 years ago

Hello, Alexey, firstly I want to thank you for this awesome repository!

Yolov4 works perfectly via both OpenCV and Darknet python bindings. But I have found some bug probably: Python version of darknet can take images of different sizes, but in fact it's resizing to width and height specified in cfg-file. I discovered it when compared output with OpenCV. OpenCV DNN version of yolov4 do not resize image to cfg's sizes.

Take a look at this example:

Firstly, create the networks. Size in cfg is default, 608x608 for both Darknet and OpenCV

network, class_names, class_colors = darknet.load_network(
            'yolo_model/yolov4.cfg',
            'yolo_model/coco.data',
            'yolo_model/yolov4.weights'
        )
model = cv2.dnn.readNetFromDarknet('../darknet/cfg/yolov4.cfg', '../darknet/yolov4.weights')

Then create input images

size = (608, 608)
image = cv2.imread('image.jpg')

# Darknet 
darknet_image = darknet.make_image(size[0], size[1], 3)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image_resized = cv2.resize(image_rgb, size,
                           interpolation=cv2.INTER_LINEAR)
darknet.copy_image_from_bytes(darknet_image, image_resized.tobytes())

#Opencv
blob = cv2.dnn.blobFromImage(image, 
                             scalefactor=1/255.0,
                             size=size, 
                             swapRB=True)
model.setInput(blob)

And now run models (postprocessing was written by me, it's correct 99.999%):

detections = darknet.detect_image(network, class_names, darknet_image, thresh=0.3, hier_thresh=0.3)
print('darknet')
print(postprocess_darknet_yolo(detections, image, size, class_names))

result = model.forward(model.getUnconnectedOutLayersNames())

print('OpenCV')
print(process_yolo_result(image, result, thres=0.3, iou_thres=0.3))

Let's see the results:

darknet
([[0.7295320410477487, 0.6251855649446186, 0.38313662378411545, 0.5131233114945261]], [[1041.4977796454177, 398.113821933144, 741.7525036460476, 554.1731764140882]], [0.7345], [0])
OpenCV
([[0.7290805785123967, 0.625, 0.3827479338842975, 0.512962962962963]], [[1041.0, 398.0, 741.0, 554.0]], [0.7359038591384888], [0])

They are the same. Good.

Now, we change "size" variable:

size = (608, 320)

And rerun just prediction, networks are the same:

darknet
([[0.7302230533800628, 0.6245246887207031, 0.3842633397955644, 0.5117551803588867]], [[1041.7449184216953, 398.13886642456055, 743.9338258442126, 552.6955947875977]], [0.5842], [0])
OpenCV
([[0.7964876033057852, 0.6324074074074074, 0.24483471074380164, 0.47962962962962963]], [[1305.0, 424.0, 474.0, 518.0]], [0.9578801989555359], [0])

Now they are not equal. And now, let's change size in CFG for darknet model to

width=608 height=320

Then rebuild rerun whole script with image size = (608,320) and see output:

darknet
([[0.7967531806544255, 0.6329329013824463, 0.24526425411826686, 0.47986955642700196]], [[1305.0983597604852, 424.43797302246094, 474.83159597296464, 518.2591209411621]], [0.9579000000000001], [0])
OpenCV
([[0.7964876033057852, 0.6324074074074074, 0.24483471074380164, 0.47962962962962963]], [[1305.0, 424.0, 474.0, 518.0]], [0.9578801989555359], [0])

Answers are the same again! So I suppose that darknet internally resize image to CFG-size.

So now question itself - can I feed different image sizes to Darknet in python without having different instances of networks with different cfgs?

Thanks in advance!

stephanecharette commented 3 years ago

Answers are the same again! So I suppose that darknet internally resize image to CFG-size.

So now question itself - can I feed different image sizes to Darknet in python without having different instances of networks with different cfgs?

Darknet (the command-line version and the C API) will always resize the images to match the network size. And the stretching/resizing of the images does not care about aspect ratio, it will distort the image if necessary so it matches the network size.

I'm not familiar with the python interface, but if the python interface uses the same C API (and I believe it does) then your images will always be stretched to match the network.

You may also find this to be somewhat useful: https://www.ccoderun.ca/programming/2020-09-25_Darknet_FAQ/#square_network

VolkovAK commented 3 years ago

@stephanecharette Thanks for answer!

I was too lazy to build OpenCV with CUDA support, so hopefully my GPU was able to fit in memory as much YOLO instances as I wanted.