daerduoCarey / SpatialTransformerLayer

Other
181 stars 97 forks source link

Bug (?) when normalising coordinates #8

Closed DrSleep closed 7 years ago

DrSleep commented 8 years ago

Hi, @daerduoCarey

I have a question on the part of the code that deals with normalising the coordinates: st_layer.cpp#L114-119

    Dtype* data = output_grid.mutable_cpu_data();
    for(int i=0; i<output_H_ * output_W_; ++i) {
        data[3 * i] = (i / output_W_) * 1.0 / output_H_ * 2 - 1;
        data[3 * i + 1] = (i % output_W_) * 1.0 / output_W_ * 2 - 1;
        data[3 * i + 2] = 1;
    }

If I have understood the paper correctly, the normalised coordinates should lie in [-1, 1]. In the code above, the upper bound is less than 1.

For example, for outputH = 2, outputW = 3 the result is as follows:

i: 0 [-1, -1, 1] i: 1 [-1, -0.333333, 1] i: 2 [-1, 0.333333, 1] i: 3 [0, -1, 1] i: 4 [0, -0.333333, 1] i: 5 [0, 0.333333, 1]

So, shouldn't it be something like this instead?

    Dtype* data = output_grid.mutable_cpu_data();
    for(int i=0; i<output_H_ * output_W_; ++i) {
        data[3 * i] = (i / output_W_) * 1.0 / (output_H_ - 1) * 2 - 1;
        data[3 * i + 1] = (i % output_W_) * 1.0 / (output_W_ - 1) * 2 - 1;
        data[3 * i + 2] = 1;
    }

Which generates the following normalised coordinates:

i: 0 [-1, -1, 1] i: 1 [-1, 0, 1] i: 2 [-1, 1, 1] i: 3 [1, -1, 1] i: 4 [1, 0, 1] i: 5 [1, 1, 1]

Thanks.

daerduoCarey commented 7 years ago

Hi, @DrSleep ,

Thank you for your interests in my implementation and your careful examination of my code.

I think this is a quantization issue. We have to find some way to discretize the output image space into some grids and compute the values via interpolation for the grids. I think the most reasonable implementation is to add grid_size/2 to all of my computed grid coordinates before applying the transformation matrix to them. But I think when outputW and outputH are large enough (maybe 64 is enough, not like your example, 2 and 3), the problem should not be so dramatic. Let's say that there is indeed some difference (fix input image and transformation matrix, the output images may be slightly different) using different quantization approaches. However, we are still safe to use any of them since the learning process of the transformation matrix in the prior mini-network should be aware of this issue and do some adjustment to produce a slightly different transformation matrix to offset the quantization issue.

Hope my response help you understand the issue. More questions are warmly welcomed!

Thank you.

Bests, Kaichun Mo

DrSleep commented 7 years ago

Yes, I agree with your points: for big images this should not pose a problem.

Thanks for your comment!