Open TccccD opened 6 years ago
@haojin2 @piiswrong @leezu @anirudh2290 @szha , please check this issue, and if possible, please help to modify L2Normalization thanks!
@TccccD the reason is that mx.symbol.norm
uses a numerically stable algorithm to compute the 2-norm (https://github.com/apache/incubator-mxnet/pull/11573), whereas the L2Normalization is prone to under or overflow. The L2Normalization should be fixed to use the same implementation as the norm Op.
Below a shorter example of the problem
In [15]: a = mx.nd.random.uniform(-5, 5, (512,100000), ctx=mx.gpu(0), dtype='float16')
In [16]: mx.nd.L2Normalization(a)
Out[16]:
[[ 0. -0. 0. ..., 0. -0. 0.]
[-0. 0. -0. ..., 0. 0. 0.]
[ 0. -0. -0. ..., 0. 0. -0.]
...,
[-0. 0. -0. ..., 0. 0. -0.]
[ 0. -0. 0. ..., 0. -0. -0.]
[-0. -0. 0. ..., -0. -0. -0.]]
<NDArray 512x100000 @gpu(0)>
In [17]: a / mx.nd.norm(a, axis=1, keepdims=True)
Out[17]:
[[ 2.19726562e-03 -3.61824036e-03 1.11007690e-03 ..., 3.14950943e-04
-4.92572784e-04 3.10516357e-03]
[ -4.07028198e-03 4.61578369e-03 -4.51278687e-03 ..., 2.33650208e-03
5.40542603e-03 3.78608704e-03]
[ 5.27572632e-03 -1.81293488e-03 -1.17683411e-03 ..., 1.86920166e-03
4.87518311e-03 -3.04412842e-03]
...,
[ -4.39834595e-03 3.74794006e-04 -4.21905518e-03 ..., 1.11007690e-03
3.81278992e-03 -3.80134583e-03]
[ 7.90953636e-05 -5.31387329e-03 4.95910645e-03 ..., 3.52859497e-03
-2.10952759e-03 -4.76837158e-04]
[ -4.53186035e-03 -3.03459167e-03 2.37083435e-03 ..., -3.93295288e-03
-4.21524048e-03 -5.36727905e-03]]
<NDArray 512x100000 @gpu(0)>
I define a new square like this:
MXNET_BINARY_MATH_OP(square_v, math::sqr(a) / math::sqr(b));
In l2_normalization_op-inl.h, I should find a suitable scale, I think the maximum value in in_data is OK. But I don't know how to find a maximum value in a Tensor data, like Tensor<xpu, 2, DType> data; could you help me? thanks! @haojin2 @piiswrong @leezu @anirudh2290 @szha
@TccccD your contribution to fix the L2Norm Op would be very welcome. Instead of trying to find a suitable scale a-priori (by e.g. looking for the max element) we can also use the scaled sum of squares algorithm added here https://github.com/apache/incubator-mxnet/pull/11573/files#diff-c8275a550b65b889051bd88c27d1e1b7R880 I'm not sure if we can easily use the Reducer interface in legacy-ops though
I tried it , but feels difficult. This may change a lot of the code。 @leezu
Ok. In general, it is planned to refactor the L2Norm Op completely to improve the interface exposed, but I believe no-one is working on it yet. If/while that is done, making use of the stable Reducer interface would be very easy.
@anirudh2290 Can you please add label to this issue: Operator, FeatureRequest
If the data is set as follow: in_data0 = mx.nd.random.uniform(-5, 5, (512,100000), ctx=mx.gpu(0)) mx.symbol.L2Normalization will got 0.0, whether forward or backward. if it is set in (-1, 1), that is OK. And in mx.symbol.norm, it is OK too.
Test code as follow: