Closed nirum closed 9 years ago
Also does not work with np.tensordot
, which also returns an array of zeros
Method 2 works because autograd replaces np.dot
with a version that, when applied, records its operation on the tape. Method 1 does not work because the ndarray methods aren't replaced, so when w.dot(x)
is evaluated in the forward pass, w
is a plain ndarray and x
is a node_types.ArrayNode and so the value of the expression is a plain old float. (I think w
probably sees x
as a fellow ndarray because Node classes override getattr.) Edited to add: When the forward pass yields a plain-old float instead of a Node type at the top level, it looks like the function doesn't depend on the input with respect to which the differentiation is happening, so a zero vector is returned (the outgrads
list is empty).
So my understanding is that, at least for now, for autograd to work, calls to ndarray methods need to be replaced with their numpy function call versions, i.e. np.func(x,y)
instead of x.func(y)
. (It's only really necessary for the calls that ultimately get applied to Node types in the forward pass.) To mitigate this problem, autograd could edit the ndarray class like it edits the numpy module (replacing its methods in-place). Another strategy would be to remove these in-place edits entirely and instead extend ndarray and wrap the numpy functions inside autograd, ultimately requiring the user to replace their import numpy as np
with something like import autograd.wrapped_numpy as np
.
np.tensordot
doesn't work because it's not in the set of supported numpy operations (yet).
Cool, thanks for the quick reply!
Hi Nirum, thanks for the bug (and thanks for the explanation, Matt).
We overrode getattr
a few days ago but it has been causing lots of headaches like these so I've rolled back the change. When a method or function hasn't been implemented in autograd yet (like ndarray.dot
and np.tensordot
- we hope to eventually cover all of numpy but this will take some time) noisy failure is much better than incorrect gradients.
@dougalm @duvenaud What do you guys think about
x.dot(y)
and x.mean()
and stuff can workimport autograd.wrapped_numpy as np
thing at the top of their file
(I think those two design decisions are orthogonal. I forget if 2 came up in group meeting.)Yes, those are both excellent ideas. I was planning to have a go at implementing them today.
We've now added tensordot (and plenty of other functions besides), but after implementing it both ways we decided that the effort of supporting the A.dot(B) syntax wasn't worth the compromises we had to make in order to subclass np.ndarray.
:+1: cool! I just tried it out. works great!
import autograd.numpy as np
is a much better pattern than editing on importNotImpelementedError
s show up for functions that haven't been extendedGreat! Let us know if you find any more bugs or have feature requests.
Why does Method 2 below (using
np.dot
) return the correct gradient, while Method 1 returns an array of zeros? Perhaps an exception should be thrown if the parser detects dot products of the formw.T.dot(x)
instead ofnp.dot(w, x)
if the latter format is really necessary.