AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

yolo layer configuration #5168

Open pfeatherstone opened 4 years ago

pfeatherstone commented 4 years ago

I've noticed their are some additional configurations to the yolo layer, which are used in the csresnext model:

truth_thresh = 1
scale_x_y = 1.1
iou_thresh=0.213
cls_normalizer=1.0
iou_normalizer=0.07

what do these do? Particularly scale_x_y

pfeatherstone commented 4 years ago

am i correct in saying rather than having bx[i] = sigmoid(tx[i]) + c[i] you get bx[i] = (sigmoid(tx[i]) - 0.5) * scale_x_y + 0.5 + c[i]

AlexeyAB commented 4 years ago

@pfeatherstone

am i correct in saying rather than having bx[i] = sigmoid(tx[i]) + c[i] you get bx[i] = (sigmoid(tx[i]) - 0.5) * scale_x_y + 0.5 + c[i]

Yes.


There are many experimental params.

pfeatherstone commented 4 years ago

Do class probabilities or width and height get scaled?

pfeatherstone commented 4 years ago

Basically I'm doing a port of csresnext50-panet-spp-original-optimal.cfg to pytorch. I've got all the layers down but I think there are a few fiddly things which are added in csresnext50-panet-spp-original-optimal.cfg which you wouldn't have in yolov3.cfg. When inferring I get correct-ish bounding boxes but they are not exactly the same as a darknet inferrence. So i think there are some additional subtleties missing.

AlexeyAB commented 4 years ago

About shortcut:

72104147-d225c880-333b-11ea-8692-b0b1b1b86bb8

image

pfeatherstone commented 4 years ago

i get slightly different bboxes for both yolov3-spp and csresnext50... when comparing against your darknet repo and my pytorch implementation. yolov3 on the other hand i get exactly the same results. yolov3-spp doesn't have any partial shortcuts, so i'm inclined to think it's not that. I was hoping it was a new experimental scaling somewhere

pfeatherstone commented 4 years ago

for yolov3-spp i get the following predictions predictions2

The first image is what i get with your repo, the second is what i get with my pytorch implementation

pfeatherstone commented 4 years ago

Both are inferred on 416x416

pfeatherstone commented 4 years ago

for csresnext50-panet-spp-original-optimal.cfg i get predictions predictions2

both are inferred on 416 x 416

pfeatherstone commented 4 years ago

It's so similar, it makes me think it's a scaling thing, or a different mode in the upsampling layers or something like that.

pfeatherstone commented 4 years ago

Maybe it's the maxpooling. that's the only difference between yolov3 and yolov3-spp.

pfeatherstone commented 4 years ago

maybe darknet has a slightly different implementation to pytorch

AlexeyAB commented 4 years ago
pfeatherstone commented 4 years ago

Yes I get exactly the same bbox for yolov3.cfg but different for yolov3-spp.cfg

pfeatherstone commented 4 years ago

By same padding, do you mean pad to same dimension, or pad using same values as border?

pfeatherstone commented 4 years ago

Ah yes, i'm not doing letterbox resizing in pytorch

AlexeyAB commented 4 years ago

Use -letter_box flag in Darknet to detect with letter_box resizig.

./darknet detector test ... -letter_box


What is the padding=SAME and padding=VALID: https://www.pico.net/kb/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-tensorflow


Padding SAME and VALID in TensorFlow: https://www.tensorflow.org/api_docs/python/tf/nn/convolution

If padding == "SAME": output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

If padding == "VALID": output_spatial_shape[i] = ceil((input_spatial_shape[i] - (spatial_filter_shape[i]-1) * dilation_rate[i]) / strides[i]).


https://github.com/pytorch/pytorch/issues/3867#issuecomment-570743193

Still no padding='same' in Pytorch

pfeatherstone commented 4 years ago

So here is the the input image dog here is your prediction predictions here is my prediction prediction2 You get the following class probabilities: bicycle: 80% dog: 95% truck: 84% car: 27% Mine are painted on the image

AlexeyAB commented 4 years ago

Remove SPP-block from yolov3-spp.cfg in both cases and run with the same yolov3-spp.weights, do you get the same results?

https://github.com/AlexeyAB/darknet/blob/afb4cc4766eb8ab3686445448fa3cf652ab78eb8/cfg/yolov3-spp.cfg#L575-L597

pfeatherstone commented 4 years ago

There is an extra convolutional layer following the SPP block. It takes 2048 channels

pfeatherstone commented 4 years ago

When inferring with your repo i get no boxes. I image it's because the weights are offset from removing the SPP block and the following convolutional layer taking a different number of channels

AlexeyAB commented 4 years ago

When inferring with your repo i get no boxes. I image it's because the weights are offset from removing the SPP block and the following convolutional layer taking a different number of channels

Yes, it will not work.

I think the issue - you don't use padding=SAME: https://github.com/AlexeyAB/darknet/issues/5168#issuecomment-608539941

pfeatherstone commented 4 years ago

Yeah padding=SAME in my stuff. Otherwise i wouldn't end up with 52x52, 26x26 and 13x13 as the input dimensions of the yolo layers

pfeatherstone commented 4 years ago

And i've checked, it pads in 'replicate' mode, not 'zeros'

AlexeyAB commented 4 years ago
  1. Check detections without NMS

  2. It should be padding=SAME Figure2 1581475727

pfeatherstone commented 4 years ago

Cheers @AlexeyAB. I'm convinced the pooling is correct. Maybe it's something else. I'll have a look tomorrow. Thanks for your time!