farizrahman4u / seq2seq

Sequence to Sequence Learning with Keras
GNU General Public License v2.0
3.17k stars 845 forks source link

Inconsistent output shapes #137

Closed eelcovdw closed 7 years ago

eelcovdw commented 7 years ago

Hi all,

I am not terribly experienced with Keras, and not sure if this is an actual issue or Keras quirk. When testing some things with the seq2seq models and functional API, I came across a (possibly) inconsistent output shape.

My code, a seq2seq model with timedistributed softmax

from keras.layers import Input
from keras.models import Model
from keras.layers import Dense
from keras.layers.wrappers import TimeDistributed
from seq2seq.models import Seq2Seq

if __name__ == "__main__":
    input_shape = (16, 32)
    model_input = Input(shape=input_shape)
    s2s = Seq2Seq(output_dim=32, output_length=16, input_shape=input_shape)
    s2s_output = s2s(model_input)
    model_output = TimeDistributed(Dense(8, activation='softmax'))(s2s_output)
    model = Model(model_input, model_output)

    s2s.summary()

    model.summary()

Printing the summaries, i get the following output:

s2s summary
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_2 (InputLayer)             (None, 16, 32)        0                                            
____________________________________________________________________________________________________
timedistributed_1 (TimeDistribute(None, 16, 32)        1056        input_2[0][0]                    
____________________________________________________________________________________________________
recurrentcontainer_1 (RecurrentCo[(None, 32), None, Non8320        timedistributed_1[0][0]          
____________________________________________________________________________________________________
dense_2 (Dense)                  (None, 32)            1056        recurrentcontainer_1[0][0]       
____________________________________________________________________________________________________
recurrentcontainer_2 (RecurrentCo(None, 16, 32)        9376        dense_2[0][0]                    
                                                                   dense_2[0][0]                    
                                                                   recurrentcontainer_1[0][1]       
                                                                   recurrentcontainer_1[0][2]       
====================================================================================================
Total params: 19808
____________________________________________________________________________________________________
model summary
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to                     
====================================================================================================
input_1 (InputLayer)             (None, 16, 32)        0                                            
____________________________________________________________________________________________________
model_1 (Model)                  ((None, 32), 16, 32)  19808       input_1[0][0]                    
____________________________________________________________________________________________________
timedistributed_2 (TimeDistribute((None, 32), 16, 8)   264         model_1[1][0]                    
====================================================================================================
Total params: 20072
____________________________________________________________________________________________________

Looking at the output_shape of the seq2seq model: in the first summary (just the s2s container), the output_shape of the last layer is as expected (None, 16, 32). However, when calling the model in with the functional API, the output_shape of the container is ((None, 32), 16, 32). Where are those parentheses and the 32 coming from?

farizrahman4u commented 7 years ago

Is your keras up-to-date ?

eelcovdw commented 7 years ago

just updated all dependencies (keras recurrentshop seq2seq), the summaries give the same output.

I did notice that no errors pop up if I just use the expected output shape. Added this to the previous script:

from keras.optimizers import SGD
import numpy as np
model.compile(SGD(), ['categorical_crossentropy'])
x = np.random.random((64, 16, 32))
y = np.random.random((64, 16, 8))
print(model.train_on_batch(x, y))