apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.77k stars 6.79k forks source link

roi_align.cc bug? #12403

Open eli2014 opened 6 years ago

eli2014 commented 6 years ago

There maybe a bug in file roi_align.cc.

Line 177 and Line 178

T roi_width = std::max(roi_end_w - roi_start_w, (T)1.); T roi_height = std::max(roi_end_h - roi_start_h, (T)1.);

I think it should be: T roi_width = std::max(roi_end_w - roi_start_w + 1, (T)1.); T roi_height = std::max(roi_end_h - roi_start_h + 1, (T)1.);

anirudhacharya commented 6 years ago

@mxnet-label-bot [Operator, Bug]

eli2014 commented 6 years ago

Besides,

line 63-65

const T yy = roi_start_h + ph bin_size_h + static_cast(iy + .5f) bin_size_h / static_cast(roi_bin_grid_h);
should be: const T yy = roi_start_h + ph bin_size_h + static_cast(iy ) bin_size_h / static_cast(roi_bin_grid_h);

line 67-69

const T xx = roi_start_w + pw bin_size_w + static_cast(ix + .5f) bin_size_w / static_cast(roi_bin_grid_w);

should be: const T xx = roi_start_w + pw bin_size_w + static_cast(ix) bin_size_w / static_cast(roi_bin_grid_w);

eli2014 commented 6 years ago

For example,

x = mx.nd.array([[[[ 0., 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10., 11.], [ 12., 13., 14., 15., 16., 17.], [ 18., 19., 20., 21., 22., 23.], [ 24., 25., 26., 27., 28., 29.], [ 30., 31., 32., 33., 34., 35.], [ 36., 37., 38., 39., 40., 41.], [ 42., 43., 44., 45., 46., 47.]]]]) y = mx.nd.array([[0,0,0,3,3]]) mx.nd.ROIAlign(x, y, (2, 2), 0.7) should be:

[[[[3,5, 5.5], [15.5, 17.5]]]]

while the output is: [[[[5.25, 6.75], [14.25, 15.75]]]]

apeforest commented 6 years ago

Hi @eli2014 if you already identified the rootcause and knew the fix, would you mind making the change directly and contribute to this repository? If so, we are more than happy to review your pull request. but also don't feel obliged :)

yuxihu commented 6 years ago

For example,

x = mx.nd.array([[[[ 0., 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10., 11.], [ 12., 13., 14., 15., 16., 17.], [ 18., 19., 20., 21., 22., 23.], [ 24., 25., 26., 27., 28., 29.], [ 30., 31., 32., 33., 34., 35.], [ 36., 37., 38., 39., 40., 41.], [ 42., 43., 44., 45., 46., 47.]]]]) y = mx.nd.array([[0,0,0,3,3]]) mx.nd.ROIAlign(x, y, (2, 2), 0.7) should be:

[[[[3,5, 5.5], [15.5, 17.5]]]]

while the output is: [[[[5.25, 6.75], [14.25, 15.75]]]]

@eli2014: could you please justify why the result should be [[[[3,5, 5.5], [15.5, 17.5]]]] ?

We tried your example with MXNet(1.3.1) and Caffe2 (0.8) and we get the same result from both frameworks, which is different from your output. Please provide more details on your test environment, like MXNet version, etc. for easier debugging.

MXNet:

>>> import mxnet as nx
>>> x = mx.nd.array([[[[ 0., 1., 2., 3., 4., 5.],
... [ 6., 7., 8., 9., 10., 11.],
... [ 12., 13., 14., 15., 16., 17.],
... [ 18., 19., 20., 21., 22., 23.],
... [ 24., 25., 26., 27., 28., 29.],
... [ 30., 31., 32., 33., 34., 35.],
... [ 36., 37., 38., 39., 40., 41.],
... [ 42., 43., 44., 45., 46., 47.]]]])
>>> y = mx.nd.array([[0,0,0,3,3]])
>>> mx.nd.contrib.ROIAlign(x, y, (2, 2), 0.7)

[[[[ 3.6749997  4.725    ]
   [ 9.974999  11.025    ]]]]
<NDArray 1x1x2x2 @cpu(0)>

Caffe2

>>> from caffe2.python import core, workspace
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
>>> import numpy as np
>>> x = np.array([[[[ 0., 1., 2., 3., 4., 5.],
... [ 6., 7., 8., 9., 10., 11.],
... [ 12., 13., 14., 15., 16., 17.],
... [ 18., 19., 20., 21., 22., 23.],
... [ 24., 25., 26., 27., 28., 29.],
... [ 30., 31., 32., 33., 34., 35.],
... [ 36., 37., 38., 39., 40., 41.],
... [ 42., 43., 44., 45., 46., 47.]]]]).astype(np.float32)
>>> r = np.array([[0,0,0,3,3]]).astype(np.float32)
>>> op = core.CreateOperator("RoIAlign", ["d", "r"], ["o"], pooled_h=2, pooled_w=2,spatial_scale=0.7)
>>> workspace.FeedBlob("d", x)
True
>>> workspace.FeedBlob("r", r)
True
>>> workspace.RunOperatorOnce(op)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0925 15:35:07.264453 2926322560 init.h:99] Caffe2 GlobalInit should be run before any other API calls.
True
>>> workspace.FetchBlob("o")
array([[[[ 3.6749997,  4.725    ],
         [ 9.974999 , 11.025    ]]]], dtype=float32)
yuxihu commented 6 years ago

Besides,

line 63-65

const T yy = roi_start_h + ph bin_size_h + static_cast(iy + .5f) bin_size_h / static_cast(roi_bin_grid_h); should be: const T yy = roi_start_h + ph bin_size_h + static_cast(iy ) bin_size_h / static_cast(roi_bin_grid_h);

line 67-69

const T xx = roi_start_w + pw bin_size_w + static_cast(ix + .5f) bin_size_w / static_cast(roi_bin_grid_w);

should be: const T xx = roi_start_w + pw bin_size_w + static_cast(ix) bin_size_w / static_cast(roi_bin_grid_w);

@eli2014: regarding your above comment, could you please also elaborate on why you think the code needs to be modified to what you suggested?

apeforest commented 6 years ago

@sandeep-krishnamurthy As @yuxihu pointed out, the results from Caffe2 and MXNet are identical. It does not seem to be a bug. Please change the label to [Discussion]. Thanks!

eli2014 commented 6 years ago

@yuxihu @apeforest I made a mistake in the example. The value of spatial_scale should be 1.0. So the statement 'mx.nd.ROIAlign(x, y, (2, 2), 0.7)' should be 'mx.nd.ROIAlign(x, y, (2, 2), 1.0)'. If you rerun the example, the output will be [[[[5.25, 6.75], [14.25, 15.75]]]].

I think the output should be [[[[3,5, 5.5], [15.5, 17.5]]]]

Because y = mx.nd.array([[0,0,0,3,3]]) means the start coordinate is (0, 0) and the end coordinate is (3, 3). The RoI is [ 0., 1., 2., 3.], [ 6., 7., 8., 9.], [ 12., 13., 14., 15.], [ 18., 19., 20., 21.]

Since pooled_size = (2, 2), and both of the height and width of RoI are 4, which can be divisible by pooled_size. According to the description of ROIAlign, we can compute the each value of the output, for example, 3.5 = (0 + 1 + 6 + 7) / 4, and so on.

sandeep-krishnamurthy commented 6 years ago

Thanks for the update! Resolving as the issue is addressed.

yuxihu commented 5 years ago

@eli2014 Thanks for your response. We will investigate and verify the algorithm based on what you provide. Since our algorithm for this operator is adapted from caffe2, we suggest that you post the same concern on caffe2 such that they are also aware and can investigate.

eli2014 commented 5 years ago

@yuxihu Thanks. I have already posted the issue on Caffe2.