chengyangfu / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
169 stars 47 forks source link

Possible bug in detection locations #19

Open abdelrahman-gaber opened 6 years ago

abdelrahman-gaber commented 6 years ago

Hi,

I am trying to use the trained models given by the authors. However, I discovered that the detection output locations are wrong and in many cases outside the image range!

To further investigate about this, I used the detection code in examples , and compared the results from the original SSD implementation of Wei and the new SSD model with ResNet101 introduced here. I tested the same image proposed in the examples (examples/images/fish-bike.jpg).

With the old SSD code with VGG model I get right results as follows: [0.028087676, 0.23656183, 0.88743579, 0.95228869, 2, 0.82035357, u'bicycle'] [0.42205709, 0.026113272, 0.70970505, 0.51584023, 15, 0.99626094, u'person']

But with the new SSD model in this repository I get: [3.6806207, 3.2651825, 3.9601259, 3.7642479, 1, 0.99327165, u'person'] [1.065011, 0.64031565, 1.3344085, 1.142953, 1, 0.9836536, u'person'] [255.99077, 256.33829, 256.99084, 256.97342, 2, 0.7926603, u'bicycle'] [14.477366, 14.709455, 15.462966, 15.438332, 2, 0.69000566, u'bicycle']

The first 4 numbers are the normalized detection locations [x_min, y_min, x_max, y_max], so after multiplying by the image width and height (481 and 323 in this case), I should get the bbox locations inside the tested image. This is the case with the original SSD models, as I get the right locations: [14, 76, 427, 308] [203, 8, 341, 167]

but with the new SSD-ResNet model introduced here, I get the locations: [1770, 1055, 1905, 1216] [512, 207, 642, 369] [123132, 82797, 123613, 83002] [6964, 4751, 7438, 4987]

which introduce bboxes outside the image! It is also obvious problem since the normalized positions should not exceed 1.0. Note that the same problem happens when using the DSSD model.

Thank you.

abdelrahman-gaber commented 6 years ago

I discovered the problem. In the deploy.prototxt change the current offsets and instead put the percentage of the offset divided by step (offset/step). In old SSD code this was 0.5 for all layers, however here it is different between layers. Therefore, the new offsets will be like this [0.31, 0.156, 0.328, 0.41, 0.457, 0.478, 0.5 ] instead of [2.5, 2.5, 10.5, 26.5, 58.5, 122.5, 256.5].

After doing this, the detection bboxes are right but not accurate as original SSD with VGG. for example, I found negative numbers in the bbox points! [0.42572773, 0.010289893, 0.70523298, 0.50935513, 1, 0.99327165, u'person'] --> [205, 3, 339, 165] [-0.0092230439, 0.3382884, 0.99083799, 0.97341633, 2, 0.7926603, u'bicycle'] --> [-4, 109, 477, 314] and the image is like this: detect_result

foralliance commented 6 years ago

@abdelrahman-gaber HI[ about offset=[2.5, 2.5, 10.5, 26.5, 58.5, 122.5, 256.5]. have u solve the problem?