HIPS / autograd

Efficiently computes derivatives of NumPy code.
MIT License
6.95k stars 906 forks source link

Error with `w.T.dot(x)`, correct if `np.dot(w, x)` is used #8

Closed nirum closed 9 years ago

nirum commented 9 years ago

Why does Method 2 below (using np.dot) return the correct gradient, while Method 1 returns an array of zeros? Perhaps an exception should be thrown if the parser detects dot products of the form w.T.dot(x) instead of np.dot(w, x) if the latter format is really necessary.

import numpy as np
from autograd import grad
w = np.arange(5).astype('float')

# Method 1
f = lambda x: w.T.dot(x)
df = grad(f)
df(np.random.randn(5))
# >>> array([0., 0., 0., 0., 0.])

# Method 2
f = lambda x: np.dot(w, x)
df = grad(f)
df(np.random.randn(5))
# >>> array([0., 1., 2., 3., 4.])
nirum commented 9 years ago

Also does not work with np.tensordot, which also returns an array of zeros

mattjj commented 9 years ago

Method 2 works because autograd replaces np.dot with a version that, when applied, records its operation on the tape. Method 1 does not work because the ndarray methods aren't replaced, so when w.dot(x) is evaluated in the forward pass, w is a plain ndarray and x is a node_types.ArrayNode and so the value of the expression is a plain old float. (I think w probably sees x as a fellow ndarray because Node classes override getattr.) Edited to add: When the forward pass yields a plain-old float instead of a Node type at the top level, it looks like the function doesn't depend on the input with respect to which the differentiation is happening, so a zero vector is returned (the outgrads list is empty).

So my understanding is that, at least for now, for autograd to work, calls to ndarray methods need to be replaced with their numpy function call versions, i.e. np.func(x,y) instead of x.func(y). (It's only really necessary for the calls that ultimately get applied to Node types in the forward pass.) To mitigate this problem, autograd could edit the ndarray class like it edits the numpy module (replacing its methods in-place). Another strategy would be to remove these in-place edits entirely and instead extend ndarray and wrap the numpy functions inside autograd, ultimately requiring the user to replace their import numpy as np with something like import autograd.wrapped_numpy as np.

np.tensordot doesn't work because it's not in the set of supported numpy operations (yet).

nirum commented 9 years ago

Cool, thanks for the quick reply!

dougalm commented 9 years ago

Hi Nirum, thanks for the bug (and thanks for the explanation, Matt).

We overrode getattr a few days ago but it has been causing lots of headaches like these so I've rolled back the change. When a method or function hasn't been implemented in autograd yet (like ndarray.dot and np.tensordot - we hope to eventually cover all of numpy but this will take some time) noisy failure is much better than incorrect gradients.

mattjj commented 9 years ago

@dougalm @duvenaud What do you guys think about

  1. subclassing ndarray so that x.dot(y) and x.mean() and stuff can work
  2. changing the interface so that instead of autograd editing numpy in-place on import, users do the import autograd.wrapped_numpy as np thing at the top of their file (I think those two design decisions are orthogonal. I forget if 2 came up in group meeting.)
dougalm commented 9 years ago

Yes, those are both excellent ideas. I was planning to have a go at implementing them today.

dougalm commented 9 years ago

We've now added tensordot (and plenty of other functions besides), but after implementing it both ways we decided that the effort of supporting the A.dot(B) syntax wasn't worth the compromises we had to make in order to subclass np.ndarray.

nirum commented 9 years ago

:+1: cool! I just tried it out. works great!

dougalm commented 9 years ago

Great! Let us know if you find any more bugs or have feature requests.