Scan: truncated gradient bug

f0k commented 8 years ago

A Lasagne user found a bug related to the gradient of theano.scan: https://groups.google.com/d/msg/lasagne-users/Dzkp4szn0Vk/3S-pYDIsBAAJ

I've simplified his example to use pure Theano:

import numpy as np
import theano
import theano.tensor as T
floatX = theano.config.floatX

num_seqs, len_seqs, num_in = (4, 10, 2)
num_hid = 5

nonlinearity = lambda X: X  # causes the error
#nonlinearity = lambda X: 1*X  # does not cause the error
gsteps = 3  # causes the error
#gsteps = -1  # does not cause the error

# input layer
x = theano.tensor.tensor3('x')

# recurrent layer
w_inhid = theano.shared(np.random.randn(num_in, num_hid).astype(floatX))
w_hidhid = theano.shared(np.random.randn(num_hid, num_hid).astype(floatX))
hid_init = theano.shared(np.zeros((num_seqs, num_hid), floatX))
y, _ = theano.scan(
        lambda i, h: nonlinearity(i + T.dot(h, w_hidhid)),
        [T.dot(x, w_inhid)],  # causes the error
        #[x],  # does not cause the error (with T.dot(i, w_inhid) inside scan)
        outputs_info=hid_init,
        truncate_gradient=gsteps)

# gradient of output wrt recurrent weights
grad = theano.grad(y.sum(), w_hidhid)  # causes the error
#grad = theano.grad(y.sum(), w_inhid)  # does not cause the error

# trigger the error
fn = theano.function([x], grad)
fn(np.random.randn(len_seqs, num_seqs, num_in).astype(floatX))

When running this (no matter if on CPU or GPU), I get:

Traceback (most recent call last):
  File "bug2.py", line 35, in <module>
    fn(np.random.randn(len_seqs, num_seqs, num_in).astype(floatX))
  File "/raid/user/jan/install/Theano-git/theano/compile/function_module.py", line 875, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/raid/user/jan/install/Theano-git/theano/gof/link.py", line 317, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/raid/user/jan/install/Theano-git/theano/compile/function_module.py", line 862, in __call__
    self.fn() if output_subset is None else\
ValueError: Shape mismatch: x has 12 cols (and 5 rows) but y has 40 rows (and 5 cols)
Apply node that caused the error: Dot22(Reshape{2}.0, Reshape{2}.0)
Toposort index: 80
Inputs types: [TensorType(float32, matrix), TensorType(float32, matrix)]
Inputs shapes: [(5, 12), (40, 5)]
Inputs strides: [(48, 4), (20, 4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Assert{msg='Theano Assert failed!'}(Dot22.0, Elemwise{eq,no_inplace}.0, Elemwise{eq,no_inplace}.0)]]

The error disappears with either of the following changes (indicated in the code above):

Adding an output nonlinearity, even if it is just 1 * X
Setting truncate_gradient=-1
Moving the dot product of input and w_inhid into the scan loop instead of performing it outside
Computing the gradient wrt. w_inhid instead of w_hidhid (well, this is kind of obvious, since it's outside scan)

lamblin commented 8 years ago

The example also seems to run with optimizer='None' or optimizer='fast_compile', so that is probably an issue in an optimization. I tried with DebugMode, but unfortunately it crashes before it can tell me which optimization is to blame.

lamblin commented 7 years ago

This may have been fixed by #5775. @Thrandis will have a look.

Thrandis commented 7 years ago

@lamblin @f0k The problem is still here, even with the last version of Theano. I'm gonna investigate this more in details!

As @lamblin said before, it works with optimizer=None and optimizer=fast_compile. The test values are also fine.

Thrandis commented 7 years ago

The faulty optimization is PushOutScanOutput. I'll work on a fix. In the meantime, you can use the following flag to disable it: THEANO_FLAGS=optimizer_excluding=scanOp_pushout_output

Thrandis commented 7 years ago

@slefrancois as said previously, the faulty optimization is PushOutScanOutput. I wrote a fix, but this fix breaks scanOp_save_mem.

It seems that PushOutScanOutput is also the cause of other issues: https://github.com/Theano/Theano/issues/5994 https://github.com/Theano/Theano/issues/5249

So I think this op needs some upgrade in general!

nouiz commented 7 years ago

https://github.com/Theano/Theano/issues/5994 isn't the same problem.

https://github.com/Theano/Theano/issues/5249 maybe is the same problem, but I don't know for sure if it use truncated gradient. To know if it is the same problem, can you check if the code in this issue have the inputs of the inner function the same at each iteration?

@Thrandis can you test if the code in this issue work without the truncated gradient? If so, would it make sense to disable this optimization for grad op when truncated gradient is used? Or can we repair that optimization in that case?

f0k commented 7 years ago

can you test if the code in this issue work without the truncated gradient?

As indicated in the original post, the error disappears when setting truncate_gradient=-1.

Theano / Theano

Scan: truncated gradient bug #4652