Closed vafl closed 4 years ago
I can reproduce this with mxnet-cu101==1.6 on a G4 instancen. This appears to be a bug with operator fusion. If you export MXNET_USE_FUSION=0
the program works as expect.
% MXNET_USE_FUSION=0 python3 test.py 24s ~ ip-172-31-32-170
hybridizing
[[0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5]
[0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5]
[0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5]]
% MXNET_USE_FUSION=1 python3 test.py 7s ~ ip-172-31-32-170
hybridizing
[[0. 1. 2. 3. 4. 5. 6. 7. 8. ]
[9. 0. 1. 2. 3. 4. 5. 6. 7. ]
[8. 9. 0. 0.5 1. 1.5 2. 2.5 3. ]]
CC @ptrendx
I will look into this - I'm not sure if this is + though that is problematic. It looks more like a problem with handling of slice_axis. I will investigate.
Ok, I see where the problem comes from - there is a bug in handling negative values for parameters in the code generator - in your example if you change axis to 1 and end to 9 you will get the right answer. I will fix that and submit PR shortly.
Description
In some cases
broadcast_add
and+
give different results. This issue seems to be new in 1.6.To Reproduce
The following network gives the wrong outputs when hybridizing and running on a GPU.
It returns:
The correct result is:
Any of the following changes will give the correct result:
broadcast_add
instead of+
(see code that is commented out)This did not happen on mxnet 1.5.
I think it is related to the
get_constant
and broadcasting that happens before in the network.Environment