farizrahman4u / seq2seq

Sequence to Sequence Learning with Keras
GNU General Public License v2.0
3.17k stars 846 forks source link

Using Seq2Seq models in modular way (nesting models) results in MissingInputError #119

Closed phdowling closed 7 years ago

phdowling commented 7 years ago

Hi! Sorry for crossposting this (I also opened this issue on the Keras main repo), but I figured maybe it's actually related to seq2seq or recurrentshop internals.

I'm trying to use a Seq2Seq model as follows:


input = Input(shape=(maxlen,))
one_hot = Lambda(
    lambda x: K.one_hot(K.cast(x, dtype="int32"), nb_classes=num_inputs), output_shape=(maxlen, num_inputs)
)(input)
output_seq = Seq2Seq(
    input_shape=(maxlen, num_inputs),
    hidden_dim=hidden_dim,
    output_length=out_maxlen, output_dim=num_inputs,
    depth=2, peek=True
)(one_hot)
predicted = TimeDistributed(Activation("softmax"))(output_seq)
model = Model(input, predicted)
return model

Which compiles fine, but when I try to fit the model using my (num_samples, maxlen) shaped matrix, Theano complains that input_2 was not provided - which, as it turns out, is the input layer of the Seq2Seq model. I was hoping this layer would be fed the output of my Lambda layer automatically, but apparently this does not work. Is what I am trying to do possible? I realize I could just copy and slightly alter the Seq2Seq code, but of course I'd prefer just using the library for more maintainable code.

More precise exception output:

theano.gof.fg.MissingInputError: ("An input of the graph, used to compute Reshape{2}(input_2, TensorConstant{[-1 53]}), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", input_2)
dr-costas commented 7 years ago

I have the same issue.

The following code:


        input_layer = Input(batch_shape=batch_input_shape)
        batch_input_shape = (None, nb_inputs)

        encoder = RecurrentContainer(
            stateful=stateful,
            return_sequences=True
        )

        encoder.add(LSTMCell(
            hidden_size_encoder,
            batch_input_shape=batch_input_shape
        ))

        for _ in range(1, depth_encoder):
            encoder.add(Dropout(dropout))
            encoder.add(LSTMCell(
                hidden_size_encoder
            ))

        encoder = Bidirectional(encoder, merge_mode=merge_mode)
        encoded = encoder(input_layer)

        batch_input_shape = (None, None,  hidden_size_encoder)
        decoder = RecurrentContainer(
            decode=True,
            stateful=stateful,
            output_length=output_length
        )

        decoder.add(Dropout(
            dropout, batch_input_shape=batch_input_shape
        ))

        decoder.add(AttentionDecoderCell(
            output_dim=hidden_size_decoder,
            hidden_dim=hidden_size_decoder
        ))

        for _ in range(1, depth_decoder):
            decoder.add(Dropout(dropout))

            decoder.add(LSTMDecoderCell(
                output_dim=hidden_size_decoder,
                hidden_dim=hidden_size_decoder
            ))

        decoded = decoder(encoded)

        output = TimeDistributed(
            Dense(
                output_dim,
                activation=output_activation
            )
        )(decoded)

        model = Model([input_layer], output)

Results to:

theano.gof.fg.MissingInputError: An input of the graph, used to compute dot(<TensorType(float32, matrix)>, HostFromGpu.0), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.

dr-costas commented 7 years ago

For some reason, if I manually write the code of the AttentionSeq2Seq (i.e. copy/paste the code of the AttentionSeq2Seq method into a different file and use the pasted code instead of the seq2seq.AttentionSeq2Seq) and add a TimeDistributedDense at the end, I get the missing input error.

If I add at the end of the seq2seq.AttentionSeq2Seq the TimeDistributed, then all is working fine!

So, what is happening?

I would like to be able to use the code inside the seq2seq.AttentionSeq2Seq method, because I want to specify the amount of cells for each layer of the encoder.

phdowling commented 7 years ago

This is what I am also doing. I added the Embedded layer and the Dense inside the same model, and things are working. If I nest the models, things break. I'm not sure if this is meant to be supported by Keras, or if this is perhaps a bug in the RecurrentContainer code.

farizrahman4u commented 7 years ago

Doesn't seem to be an issue with Keras; since nesting usual models seems to be working. Will fix this soon.

farizrahman4u commented 7 years ago

Fixed.

dr-costas commented 7 years ago

Hi. Is not fixed. I tried again with the previously posted code and again the missing input error persists.

dr-costas commented 7 years ago

The code is:

encoder = RecurrentContainer()
encoder.add(LSTMCell())

for _in range():
    encoder.add(Dropout())
    encoder.add(LSTMCell())

input_layer = Input()
input_layer._keras_history[0].supports_masking = True

encoder = Bidirectional(encoder)
encoded = encoder(input_layer)

decoder = RecurrentContainer()
decoder.add(Dropout())
decoder.add(AttentionDecoderCell())

for _ in range():
    decoder.add(Dropout())
    decoder.add(LSTMDecoderCell())

decoded = decoder(encoded)

output = TimeDistributed(Dense())

model = Model(input_layer, output)

The error is:

theano.gof.fg.MissingInputError: An input of the graph, used to compute dot(<TensorType(float32, matrix)>, HostFromGpu.0), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.

dr-costas commented 7 years ago

The weird thing is that when used as:

m = AttentionSeq2Seq(
            input_dim=self._input_nb_features,
            input_length=self._input_length,
            hidden_dim=self._hidden_size_words,
            output_length=self._output_length,
            output_dim=self._hidden_size_words,
            depth=depth
        )

model = Sequential()
model.add(m)
model.add(TimeDistributed(Dense()))

model.compile(loss='mse', optimizer='rmsprop')

works....

That means that we do not have any fine grained control on the layers of the encoder (e.g. different amount of cells for each layer or residual connections).

phdowling commented 7 years ago

FYI, using this with seq2seq models directly works fine for me (apparently also with different depth parameters). I add embedding and convolution before the model, timedistributed dense with softmax after. Perhaps there's a deeper issue in recurrentshop that makes your code fail? Did you update both libraries?

dr-costas commented 7 years ago

If you mean the code that you posted:

I'm trying to use a Seq2Seq model as follows:

input = Input(shape=(maxlen,))
one_hot = Lambda(
    lambda x: K.one_hot(K.cast(x, dtype="int32"), nb_classes=num_inputs), output_shape=(maxlen, num_inputs)
)(input)
output_seq = Seq2Seq(
    input_shape=(maxlen, num_inputs),
    hidden_dim=hidden_dim,
    output_length=out_maxlen, output_dim=num_inputs,
    depth=2, peek=True
)(one_hot)
predicted = TimeDistributed(Activation("softmax"))(output_seq)
model = Model(input, predicted)
return model

then, yes. This works.

But, I tried to reproduce the model of Attention Seq2Seq outside of the seq2seq package in order to have different amount of cells in each encoder layer (e.g.). This did not worked.

If I created the model from the Seq2Seq package and then added layers before and/or after it, then it works.

farizrahman4u commented 7 years ago

@dr-costas Post your actual code.

dr-costas commented 7 years ago

Hi

I just tried the following code:

input_length = 10
    batch_size = None
    output_length = 10
    input_features = 32
    lstm_cells = 32
    dropout = .5
    output_dim = 32
    dense_output = 2

    batch_shape = (batch_size, input_length, input_features)

    x = np.random.rand(2, input_length, input_features)
    y = np.random.rand(2, output_length, dense_output)

    encoder = RecurrentContainer(input_length=input_length, return_sequences=True)
    encoder.add(LSTMCell(lstm_cells, batch_input_shape=(batch_size, input_features)))

    for _ in range(1, 3):
        encoder.add(Dropout(dropout))
        encoder.add(LSTMCell(lstm_cells))

    input_layer = Input(batch_shape=batch_shape)
    input_layer._keras_history[0].supports_masking = True

    encoder = Bidirectional(encoder, merge_mode='sum')
    encoded = encoder(input_layer)

    decoder = RecurrentContainer(decode=True, output_length=output_length)
    decoder.add(Dropout(dropout, batch_input_shape=batch_shape))
    decoder.add(AttentionDecoderCell(output_dim=output_dim, hidden_dim=output_dim))

    for _ in range(1, 3):
        decoder.add(Dropout(dropout))
        decoder.add(LSTMDecoderCell(output_dim=output_dim, hidden_dim=output_dim))

    decoded = decoder(encoded)

    output = TimeDistributed(Dense(dense_output))(decoded)

    model = Model(input_layer, output)

    model.compile(loss='mse', optimizer='rmsprop')

    model.fit(x, y)

and it works.

Thnx.

dr-costas commented 7 years ago

Hi,

the exact previous code that I posted, I used it in a class. But it does not work.

The code is:


        encoder = RecurrentContainer(input_length=None, return_sequences=True)
        encoder.add(LSTMCell(self._hidden_size_1, batch_input_shape=(None, self._input_nb_features)))

        for _ in range(1, self._depth_encoder):
            encoder.add(Dropout(self._dropout))
            encoder.add(LSTMCell(self._hidden_size_1))

        input_layer = Input(batch_shape=(None, None, self._input_nb_features))
        input_layer._keras_history[0].supports_masking = True

        encoder = Bidirectional(encoder, merge_mode='sum')
        encoded = encoder(input_layer)

        decoder = RecurrentContainer(decode=True, output_length=self._output_length)
        decoder.add(Dropout(self._dropout, batch_input_shape=(None, None, self._input_nb_features)))
        decoder.add(AttentionDecoderCell(output_dim=self._hidden_size_2, hidden_dim=self._hidden_size_2))

        for _ in range(1, self._depth_decoder):
            decoder.add(Dropout(self._dropout))
            decoder.add(LSTMDecoderCell(output_dim=self._hidden_size_2, hidden_dim=self._hidden_size_2))

        decoded = decoder(encoded)

        output = TimeDistributed(Dense(self._output_dim, activation=self._output_activation))(decoded)

        self._model = Model(input_layer, output)
dr-costas commented 7 years ago

The error is the usual missing input error.

dr-costas commented 7 years ago

Following the issue #131 , I used dropout = 0 and the error was removed.

So, can we use as above the seq2seq and with dropout values greater than 0.0?

farizrahman4u commented 7 years ago

118

dr-costas commented 7 years ago

The problem is at fit function and not in predict function.

dajaj commented 7 years ago

Hi, I have the same problem : I have a working Seq2Seq with unchanged dropout (so dropout = 0.), but when I try to set the dropout to 0.1 for exemple, the MissingInputException is raised when I fit the model. Did you find a way to get it working ?

dajaj commented 7 years ago
from seq2seq import Seq2Seq
import numpy as np

X = np.random.rand(10,20,128)
y = X

model = Seq2Seq(input_dim=128,output_dim=128,output_length=20,hidden_dim=128,dropout=0.1)
model.compile(loss='mse',optimizer='rmsprop',metrics=['accuracy'])
model.fit(X,y,nb_epoch=5)

MissingInputError: ("An input of the graph, used to compute dot(<TensorType(float32, matrix)>, lstmcell_4_U), was not provided and not given a value.Use the Theano flag exception_verbosity='high',for more information on this error.", <TensorType(float32, matrix)>)

TillBeemelmanns commented 7 years ago

Your script works fine for me. Sure that you do not use early stopping ? https://github.com/farizrahman4u/seq2seq/issues/118#issuecomment-268564548

dajaj commented 7 years ago

Is it possible I'm using early stopping without knowing it ?