combining readout and teacher_force

i cannot find a way of combining the two examples from the documentation. E.g., i want to decode a 13-dimensional state vector s0 into a sequence of length 11. Each element of the sequence is a softmax over a vocabulary of 3 words. I'm using batch size of 7. This is what I've got, but it fails in fit due to a shape problem:

import numpy as np
from keras.layers import Input, Dense
from keras.engine import Model
from recurrentshop import RecurrentModel, RecurrentSequential
from recurrentshop.cells import GRUCell

rnn = RecurrentSequential(decode=True, output_length=11, readout='readout_only',
          teacher_force=True, return_sequences=True)
rnn.add(GRUCell(13, input_dim=3))
rnn.add(Dense(3, activation='softmax', input_dim=13))

x = Input((3,), name='x')
y0 = Input((3,), name='y0')
s0 = Input((13,), name='s0')
yt = Input((11, 3), name='yt')
y = rnn(s0, ground_truth=yt, initial_readout=y0)

model = Model([x, y0, yt, s0], y)
model.compile('sgd', 'categorical_crossentropy')

npyt = np.ones((7, 11, 3))
npx = np.zeros((7, 3))
npy0 = np.ones((7, 3))
nps0 = np.ones((7, 13))
model.fit([npx, npy0, npyt, nps0], npyt)
y2 = model.predict([npx, npy0, npyt, nps0])

i also tried passing s0 in the initial_state parameter of rnn, but that breaks even earlier (pop from empty inputs_list in the rnn(...) call.

farizrahman4u / recurrentshop

combining readout and teacher_force #61