Open eli2014 opened 6 years ago
@mxnet-label-bot [Operator, Bug]
Besides,
const T yy = roi_start_h + ph bin_size_h +
static_cast
should be:
const T yy = roi_start_h + ph bin_size_h +
static_cast
const T xx = roi_start_w + pw bin_size_w +
static_cast
should be:
const T xx = roi_start_w + pw bin_size_w +
static_cast
For example,
x = mx.nd.array([[[[ 0., 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10., 11.], [ 12., 13., 14., 15., 16., 17.], [ 18., 19., 20., 21., 22., 23.], [ 24., 25., 26., 27., 28., 29.], [ 30., 31., 32., 33., 34., 35.], [ 36., 37., 38., 39., 40., 41.], [ 42., 43., 44., 45., 46., 47.]]]]) y = mx.nd.array([[0,0,0,3,3]]) mx.nd.ROIAlign(x, y, (2, 2), 0.7) should be:
[[[[3,5, 5.5], [15.5, 17.5]]]]
while the output is: [[[[5.25, 6.75], [14.25, 15.75]]]]
Hi @eli2014 if you already identified the rootcause and knew the fix, would you mind making the change directly and contribute to this repository? If so, we are more than happy to review your pull request. but also don't feel obliged :)
For example,
x = mx.nd.array([[[[ 0., 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10., 11.], [ 12., 13., 14., 15., 16., 17.], [ 18., 19., 20., 21., 22., 23.], [ 24., 25., 26., 27., 28., 29.], [ 30., 31., 32., 33., 34., 35.], [ 36., 37., 38., 39., 40., 41.], [ 42., 43., 44., 45., 46., 47.]]]]) y = mx.nd.array([[0,0,0,3,3]]) mx.nd.ROIAlign(x, y, (2, 2), 0.7) should be:
[[[[3,5, 5.5], [15.5, 17.5]]]]
while the output is: [[[[5.25, 6.75], [14.25, 15.75]]]]
@eli2014: could you please justify why the result should be [[[[3,5, 5.5], [15.5, 17.5]]]] ?
We tried your example with MXNet(1.3.1) and Caffe2 (0.8) and we get the same result from both frameworks, which is different from your output. Please provide more details on your test environment, like MXNet version, etc. for easier debugging.
MXNet:
>>> import mxnet as nx
>>> x = mx.nd.array([[[[ 0., 1., 2., 3., 4., 5.],
... [ 6., 7., 8., 9., 10., 11.],
... [ 12., 13., 14., 15., 16., 17.],
... [ 18., 19., 20., 21., 22., 23.],
... [ 24., 25., 26., 27., 28., 29.],
... [ 30., 31., 32., 33., 34., 35.],
... [ 36., 37., 38., 39., 40., 41.],
... [ 42., 43., 44., 45., 46., 47.]]]])
>>> y = mx.nd.array([[0,0,0,3,3]])
>>> mx.nd.contrib.ROIAlign(x, y, (2, 2), 0.7)
[[[[ 3.6749997 4.725 ]
[ 9.974999 11.025 ]]]]
<NDArray 1x1x2x2 @cpu(0)>
Caffe2
>>> from caffe2.python import core, workspace
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
>>> import numpy as np
>>> x = np.array([[[[ 0., 1., 2., 3., 4., 5.],
... [ 6., 7., 8., 9., 10., 11.],
... [ 12., 13., 14., 15., 16., 17.],
... [ 18., 19., 20., 21., 22., 23.],
... [ 24., 25., 26., 27., 28., 29.],
... [ 30., 31., 32., 33., 34., 35.],
... [ 36., 37., 38., 39., 40., 41.],
... [ 42., 43., 44., 45., 46., 47.]]]]).astype(np.float32)
>>> r = np.array([[0,0,0,3,3]]).astype(np.float32)
>>> op = core.CreateOperator("RoIAlign", ["d", "r"], ["o"], pooled_h=2, pooled_w=2,spatial_scale=0.7)
>>> workspace.FeedBlob("d", x)
True
>>> workspace.FeedBlob("r", r)
True
>>> workspace.RunOperatorOnce(op)
WARNING: Logging before InitGoogleLogging() is written to STDERR
W0925 15:35:07.264453 2926322560 init.h:99] Caffe2 GlobalInit should be run before any other API calls.
True
>>> workspace.FetchBlob("o")
array([[[[ 3.6749997, 4.725 ],
[ 9.974999 , 11.025 ]]]], dtype=float32)
Besides,
line 63-65
const T yy = roi_start_h + ph bin_size_h + static_cast(iy + .5f) bin_size_h / static_cast(roi_bin_grid_h); should be: const T yy = roi_start_h + ph bin_size_h + static_cast(iy ) bin_size_h / static_cast(roi_bin_grid_h);
line 67-69
const T xx = roi_start_w + pw bin_size_w + static_cast(ix + .5f) bin_size_w / static_cast(roi_bin_grid_w);
should be: const T xx = roi_start_w + pw bin_size_w + static_cast(ix) bin_size_w / static_cast(roi_bin_grid_w);
@eli2014: regarding your above comment, could you please also elaborate on why you think the code needs to be modified to what you suggested?
@sandeep-krishnamurthy As @yuxihu pointed out, the results from Caffe2 and MXNet are identical. It does not seem to be a bug. Please change the label to [Discussion]. Thanks!
@yuxihu @apeforest I made a mistake in the example. The value of spatial_scale should be 1.0. So the statement 'mx.nd.ROIAlign(x, y, (2, 2), 0.7)' should be 'mx.nd.ROIAlign(x, y, (2, 2), 1.0)'. If you rerun the example, the output will be [[[[5.25, 6.75], [14.25, 15.75]]]].
I think the output should be [[[[3,5, 5.5], [15.5, 17.5]]]]
Because y = mx.nd.array([[0,0,0,3,3]]) means the start coordinate is (0, 0) and the end coordinate is (3, 3). The RoI is [ 0., 1., 2., 3.], [ 6., 7., 8., 9.], [ 12., 13., 14., 15.], [ 18., 19., 20., 21.]
Since pooled_size = (2, 2), and both of the height and width of RoI are 4, which can be divisible by pooled_size. According to the description of ROIAlign, we can compute the each value of the output, for example, 3.5 = (0 + 1 + 6 + 7) / 4, and so on.
Thanks for the update! Resolving as the issue is addressed.
@eli2014 Thanks for your response. We will investigate and verify the algorithm based on what you provide. Since our algorithm for this operator is adapted from caffe2, we suggest that you post the same concern on caffe2 such that they are also aware and can investigate.
@yuxihu Thanks. I have already posted the issue on Caffe2.
There maybe a bug in file roi_align.cc.
Line 177 and Line 178
T roi_width = std::max(roi_end_w - roi_start_w, (T)1.); T roi_height = std::max(roi_end_h - roi_start_h, (T)1.);
I think it should be: T roi_width = std::max(roi_end_w - roi_start_w + 1, (T)1.); T roi_height = std::max(roi_end_h - roi_start_h + 1, (T)1.);