Open pfeatherstone opened 4 years ago
am i correct in saying rather than having
bx[i] = sigmoid(tx[i]) + c[i]
you get
bx[i] = (sigmoid(tx[i]) - 0.5) * scale_x_y + 0.5 + c[i]
@pfeatherstone
am i correct in saying rather than having bx[i] = sigmoid(tx[i]) + c[i] you get bx[i] = (sigmoid(tx[i]) - 0.5) * scale_x_y + 0.5 + c[i]
Yes.
There are many experimental params.
Do class probabilities or width and height get scaled?
Basically I'm doing a port of csresnext50-panet-spp-original-optimal.cfg to pytorch. I've got all the layers down but I think there are a few fiddly things which are added in csresnext50-panet-spp-original-optimal.cfg which you wouldn't have in yolov3.cfg. When inferring I get correct-ish bounding boxes but they are not exactly the same as a darknet inferrence. So i think there are some additional subtleties missing.
About shortcut:
i get slightly different bboxes for both yolov3-spp and csresnext50... when comparing against your darknet repo and my pytorch implementation. yolov3 on the other hand i get exactly the same results. yolov3-spp doesn't have any partial shortcuts, so i'm inclined to think it's not that. I was hoping it was a new experimental scaling somewhere
for yolov3-spp i get the following
The first image is what i get with your repo, the second is what i get with my pytorch implementation
Both are inferred on 416x416
for csresnext50-panet-spp-original-optimal.cfg i get
both are inferred on 416 x 416
It's so similar, it makes me think it's a scaling thing, or a different mode in the upsampling layers or something like that.
Maybe it's the maxpooling. that's the only difference between yolov3 and yolov3-spp.
maybe darknet has a slightly different implementation to pytorch
Do you get the same bbox for yolov3.cfg but different for yolov3-spp.cfg?
Check SPP-block. And do you use padding=SAME
in Pytorch for maxpool layers in SPP-block? https://github.com/pytorch/pytorch/issues/3867#issuecomment-570743193
Use the same resized to 416x416 dog.png
image, not jpeg, and not 768x576. Resize it in Paint and save to dog.png. May be you are using different resizing approach: https://github.com/AlexeyAB/darknet/issues/232#issuecomment-336955485
Also check that you use get the same output before NMS in both cases Darknet and Pytorch.
Yes I get exactly the same bbox for yolov3.cfg but different for yolov3-spp.cfg
By same padding, do you mean pad to same dimension, or pad using same values as border?
Ah yes, i'm not doing letterbox resizing in pytorch
Use -letter_box
flag in Darknet to detect with letter_box resizig.
./darknet detector test ... -letter_box
What is the padding=SAME and padding=VALID: https://www.pico.net/kb/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-tensorflow
Padding SAME and VALID in TensorFlow: https://www.tensorflow.org/api_docs/python/tf/nn/convolution
If padding == "SAME": output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])
If padding == "VALID": output_spatial_shape[i] = ceil((input_spatial_shape[i] - (spatial_filter_shape[i]-1) * dilation_rate[i]) / strides[i]).
https://github.com/pytorch/pytorch/issues/3867#issuecomment-570743193
Still no padding='same' in Pytorch
So here is the the input image here is your prediction here is my prediction You get the following class probabilities: bicycle: 80% dog: 95% truck: 84% car: 27% Mine are painted on the image
Remove SPP-block from yolov3-spp.cfg in both cases and run with the same yolov3-spp.weights, do you get the same results?
There is an extra convolutional layer following the SPP block. It takes 2048 channels
When inferring with your repo i get no boxes. I image it's because the weights are offset from removing the SPP block and the following convolutional layer taking a different number of channels
When inferring with your repo i get no boxes. I image it's because the weights are offset from removing the SPP block and the following convolutional layer taking a different number of channels
Yes, it will not work.
I think the issue - you don't use padding=SAME: https://github.com/AlexeyAB/darknet/issues/5168#issuecomment-608539941
Yeah padding=SAME in my stuff. Otherwise i wouldn't end up with 52x52, 26x26 and 13x13 as the input dimensions of the yolo layers
And i've checked, it pads in 'replicate' mode, not 'zeros'
Check detections without NMS
It should be padding=SAME
Cheers @AlexeyAB. I'm convinced the pooling is correct. Maybe it's something else. I'll have a look tomorrow. Thanks for your time!
I've noticed their are some additional configurations to the yolo layer, which are used in the csresnext model:
what do these do? Particularly scale_x_y