yolo layer configuration

pfeatherstone commented 4 years ago

I've noticed their are some additional configurations to the yolo layer, which are used in the csresnext model:

truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07

what do these do? Particularly scale_x_y

pfeatherstone commented 4 years ago

am i correct in saying rather than having bx[i] = sigmoid(tx[i]) + c[i] you get bx[i] = (sigmoid(tx[i]) - 0.5) * scale_x_y + 0.5 + c[i]

AlexeyAB commented 4 years ago

@pfeatherstone

am i correct in saying rather than having bx[i] = sigmoid(tx[i]) + c[i] you get bx[i] = (sigmoid(tx[i]) - 0.5) * scale_x_y + 0.5 + c[i]

Yes.

There are many experimental params.

pfeatherstone commented 4 years ago

Do class probabilities or width and height get scaled?

pfeatherstone commented 4 years ago

Basically I'm doing a port of csresnext50-panet-spp-original-optimal.cfg to pytorch. I've got all the layers down but I think there are a few fiddly things which are added in csresnext50-panet-spp-original-optimal.cfg which you wouldn't have in yolov3.cfg. When inferring I get correct-ish bounding boxes but they are not exactly the same as a darknet inferrence. So i think there are some additional subtleties missing.

AlexeyAB commented 4 years ago

Do you get the same bbox coords for yolov3-spp.cfg but different coords for csresnext50-panet.cfg?
Did you successfully implemented: grouped-convolutional, scale_x_y, partial shortcut?

About shortcut:

72104147-d225c880-333b-11ea-8692-b0b1b1b86bb8

pfeatherstone commented 4 years ago

i get slightly different bboxes for both yolov3-spp and csresnext50... when comparing against your darknet repo and my pytorch implementation. yolov3 on the other hand i get exactly the same results. yolov3-spp doesn't have any partial shortcuts, so i'm inclined to think it's not that. I was hoping it was a new experimental scaling somewhere

pfeatherstone commented 4 years ago

for yolov3-spp i get the following predictions predictions2

The first image is what i get with your repo, the second is what i get with my pytorch implementation

pfeatherstone commented 4 years ago

Both are inferred on 416x416

pfeatherstone commented 4 years ago

for csresnext50-panet-spp-original-optimal.cfg i get predictions predictions2

both are inferred on 416 x 416

pfeatherstone commented 4 years ago

It's so similar, it makes me think it's a scaling thing, or a different mode in the upsampling layers or something like that.

pfeatherstone commented 4 years ago

Maybe it's the maxpooling. that's the only difference between yolov3 and yolov3-spp.

pfeatherstone commented 4 years ago

maybe darknet has a slightly different implementation to pytorch

AlexeyAB commented 4 years ago

Do you get the same bbox for yolov3.cfg but different for yolov3-spp.cfg?
Check SPP-block. And do you use padding=SAME in Pytorch for maxpool layers in SPP-block? https://github.com/pytorch/pytorch/issues/3867#issuecomment-570743193
Use the same resized to 416x416 dog.png image, not jpeg, and not 768x576. Resize it in Paint and save to dog.png. May be you are using different resizing approach: https://github.com/AlexeyAB/darknet/issues/232#issuecomment-336955485
Also check that you use get the same output before NMS in both cases Darknet and Pytorch.

pfeatherstone commented 4 years ago

Yes I get exactly the same bbox for yolov3.cfg but different for yolov3-spp.cfg

pfeatherstone commented 4 years ago

By same padding, do you mean pad to same dimension, or pad using same values as border?

pfeatherstone commented 4 years ago

Ah yes, i'm not doing letterbox resizing in pytorch

AlexeyAB commented 4 years ago

Use -letter_box flag in Darknet to detect with letter_box resizig.

./darknet detector test ... -letter_box

What is the padding=SAME and padding=VALID: https://www.pico.net/kb/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-tensorflow

Padding SAME and VALID in TensorFlow: https://www.tensorflow.org/api_docs/python/tf/nn/convolution

If padding == "SAME": output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

If padding == "VALID": output_spatial_shape[i] = ceil((input_spatial_shape[i] - (spatial_filter_shape[i]-1) * dilation_rate[i]) / strides[i]).

https://github.com/pytorch/pytorch/issues/3867#issuecomment-570743193

Still no padding='same' in Pytorch

pfeatherstone commented 4 years ago

So here is the the input image dog here is your prediction predictions here is my prediction prediction2 You get the following class probabilities: bicycle: 80% dog: 95% truck: 84% car: 27% Mine are painted on the image

AlexeyAB commented 4 years ago

Remove SPP-block from yolov3-spp.cfg in both cases and run with the same yolov3-spp.weights, do you get the same results?

https://github.com/AlexeyAB/darknet/blob/afb4cc4766eb8ab3686445448fa3cf652ab78eb8/cfg/yolov3-spp.cfg#L575-L597

pfeatherstone commented 4 years ago

There is an extra convolutional layer following the SPP block. It takes 2048 channels

pfeatherstone commented 4 years ago

When inferring with your repo i get no boxes. I image it's because the weights are offset from removing the SPP block and the following convolutional layer taking a different number of channels

AlexeyAB commented 4 years ago

When inferring with your repo i get no boxes. I image it's because the weights are offset from removing the SPP block and the following convolutional layer taking a different number of channels

Yes, it will not work.

I think the issue - you don't use padding=SAME: https://github.com/AlexeyAB/darknet/issues/5168#issuecomment-608539941

pfeatherstone commented 4 years ago

Yeah padding=SAME in my stuff. Otherwise i wouldn't end up with 52x52, 26x26 and 13x13 as the input dimensions of the yolo layers

pfeatherstone commented 4 years ago

And i've checked, it pads in 'replicate' mode, not 'zeros'

AlexeyAB commented 4 years ago

Check detections without NMS
It should be padding=SAME

pfeatherstone commented 4 years ago

Cheers @AlexeyAB. I'm convinced the pooling is correct. Maybe it's something else. I'll have a look tomorrow. Thanks for your time!

AlexeyAB / darknet

yolo layer configuration #5168