Open ghost opened 4 years ago
@zleyk22 Yes, it's a bug and also exists in numpy.
import mxnet as mx
mx.npx.set_np()
a = mx.np.array([1,0])
a.attach_grad()
with mx.autograd.record():
b = mx.np.prod(a)
b.backward()
print(a.grad)
Output:
[ 0. nan]
Xingjian,
If it is a bug, then possibly somebody tries to fix it. Do you know such a person? I would like to contact him/her.
But if nobody is fixing this bug, then I volunteer to fix it. And as I see it now it may not be an easy task.
Zbigniew
On 4/15/20 11:19 PM, Xingjian Shi wrote:
@zleyk22 https://github.com/zleyk22 Yes, it's a bug and also exists in numpy.
import mxnetas mx mx.npx.set_np() a= mx.np.array([1,0]) a.attach_grad() with mx.autograd.record(): b= mx.np.prod(a) b.backward() print(a.grad)
Output:
|[ 0. nan] |
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-mxnet/issues/18078#issuecomment-614406630, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3PCDZ7IXHMQE6L7BMYYDTRM2BMRANCNFSM4MJCT54Q.
@zleyk22 I think currently no one is looking at the issue. Would you try to solve it? Thanks for pointing out the problem!
Yes, I will try to fix it.
On 4/16/20 11:44 AM, Xingjian Shi wrote:
@zleyk22 https://github.com/zleyk22 I think currently no one is looking at the issue. Would you try to solve it? Thanks for pointing out the problem!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-mxnet/issues/18078#issuecomment-614766460, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3PCD2XA7POGEXRQ4L2SK3RM4YYDANCNFSM4MJCT54Q.
@zleyk22 I can think of two possible ways to solve this problem: 1) Use two cumsums
[\log a_0, \log a_0 + \log a_1, ..., \log a_0 + \log a_1 + ... \log a_{n-3}]
,[\log a_2 + \log a_3 ... + \log a_{n-1}, \log a_3 + ... + \log a_{n-1}, ... \log a_{n-1}]
Then, sum up these two cumsums. Here, I think we should take the log-sum approach to avoid the overflow/underflow problem of multiplying lots of numbers. (Also this is the algorithm used to solve https://leetcode.com/problems/product-of-array-except-self/)
2) Detect 0s and give them special treatment. We may detect the positions of the zeros and update the gradient of these positions with the correct value.@zleyk22 are you working on this?
Yes, and no. I have been looking into this code, but there are many parts of the code which are not clear for me. Therefore, I am not sure how to fix it in a reasonable way in order not to introduce more bugs. I was investigating how this code was written. What I saw, this code was written by Eric Junyuan Xie in the end of 2016. I thought that going back in time I would learn how this code was written and would find some comments and explanations. But there was nothing which suggested why the author chose that way of implementing. So currently, I am not going to fix it. I prefer to wait a while in order to figure out what to do next. Maybe it would be advisable to talk to Eric Junyuan Xie to learn more about his code?
On 5/13/20 3:45 PM, Yizhi Liu wrote:
@zleyk22 https://github.com/zleyk22 are you working on this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/apache/incubator-mxnet/issues/18078#issuecomment-628236372, or unsubscribe https://github.com/notifications/unsubscribe-auth/AO3PCD67QIMEOXF2TMBRE3LRRMBG5ANCNFSM4MJCT54Q.
@yzhliu
Me and @zleyk22 spent some time investigating this and tracking it down. We managed to track it down to this line.
OP::Map(data[i], DType(out[out_idx]))
computes DType(out[out_idx]) / data[i]
, which returns nan when data[i]
is zero (our input is mx.nd.array([[4, 0]])
).
The logic behind this computation is that (unless zeros are involved) dividing x*y*z
by e.g. x
gives you y*z
which is the desired gradient. But this trick doesn't work when zeros are involved. In this case, the code should just recompute the entire product, leaving x
out.
But it seems that this infrastructure is quite rigid and only allows to re-use the previously-computed output and one single input element at a time. So it seems that fixing this bug would require some changes to this system, which is shared by more than one operator. Due to our low familiarity with the codebase, we don't think we are able to proceed further without a bit of guidance and help.
Can you suggest a way forward? Thanks!
Description
I investigated the MXNet C++ code and noticed that the gradient of the input array
[[x, y]]
is computed as[[(x*y)/x, (x*y)/y]]
. Therefore, ify
is zero, then we always getnan
as the second element of the array. Is it a bug?Error Message
No error message
To Reproduce
This is a python program I run:
I get this output:
Environment
We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
paste outputs here