flandolfi / clockwork-rnn

High-level implementation of the ClockworkRNN
MIT License
18 stars 0 forks source link

Stacking clockworn-rnn layers #1

Closed vaxherra closed 5 years ago

vaxherra commented 5 years ago

Hi,

Thank you for your code implementation. I want to stack some these layers on top of each other, essentially like LSTMs in Keras, but I am experiencing some errors with the dimensionality.

for example my input is 10k samples, each has 100 timesteps and is univariate so, X.shape = (10000,100,1)

In particular, I am planning to use CW-RNN to form an autoencoder, so my ideal solution would be to have an architecture where the output for CW-RNN for example would be 32-dimensional, (encoding layer), and next I'd repeat that vector, and output my reconstruction:

model = Sequential()
model.add(ClockworkRNN(periods=[1, 2, 4, 8, 16, 32, 64, 128], 
                       units_per_period=64, 
                       input_shape=(None, 1), 
                       output_units=32, 
                       rnn_dtype='CuDNNLSTM'))

model.add(RepeatVector(64))
model.add(Reshape((32,64) ))

model.add(ClockworkRNN(periods=[1, 2, 4, 8, 16, 32, 64, 128], 
                       units_per_period=64, 
                       input_shape=(64, 1), 
                       output_units=100, 
                       rnn_dtype='CuDNNLSTM'))

model.compile(optimizer='adam', loss='mse')
model.summary()

But apparently there is something wrong with this implementation, as ClockworkRNN auto assumes the input of 'None'

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 32, 64), (None, None, 64)]

I also noticed that the 'None' in input shape is necessary: (a) runs correctly, while (b) spits error:

(a) OK model

model = Sequential()
model.add(ClockworkRNN(periods=[1, 2, 4, 8, 16, 32, 64, 128], 
                       units_per_period=64, 
                       input_shape=(None, 1), 
                       output_units=100, 
                       rnn_dtype='CuDNNLSTM'))

model.add(Reshape((100,-1)))

model.compile(optimizer='adam', loss='mse')
model.summary()

(b) ValueError code: notice change from None to 100 in input_shape

model = Sequential()
model.add(ClockworkRNN(periods=[1, 2, 4, 8, 16, 32, 64, 128], 
                       units_per_period=64, 
                       input_shape=(100, 1),  # **notice change from None to 100**
                       output_units=100, 
                       rnn_dtype='CuDNNLSTM'))

model.add(Reshape((100,-1)))

model.compile(optimizer='adam', loss='mse')
model.summary()

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 100, 1), (None, None, 64)]

Could you please investigate?

flandolfi commented 5 years ago

Hi Robert,

Thanks for pointing this out. I actually stumbled upon this problem a while ago, but was too lazy to fix it. Anyway, I updated the code and now it should work fine, try it out!

On the other hand, I need to warn you that this may be just one of the many problems of this implementation. A few examples:

Regarding your problem, if I understand correctly, you are modelling an autoencoder where the encoder creates the contextual information that will be fed into the decoder at each timestep. This can be a good approach, but you should also try to create an encoding using the hidden states of each CW-block of the encoder (no Dense layer), and feed them to the decoder as first hidden state of its CW-blocks (in a 1:1 fashion). To do this you should define a method to initialize the hidden states of the cells, which is yet to be implemented.

Moreover, you said you have X.shape = (10000, 100, 1). Since every sample does not have more than 100 steps, you should not use periods greater than 100 (I would not even use a period of 64). Also, keep in mind that if you use an LSTM layer for each CW-block, you may use larger periods for each block (an LSTM has, by definition, more "memory" than a SimpleRNN). E.g., you could use just periods=[1, 10].

These are just suggestion which were not even requested, so you can ignore them. :)

Have a nice day, Francesco

vaxherra commented 5 years ago

This is great!

I didn't expect you are going to reply so quickly, not to mention fixing the issue. Could you explain how did you manage to fix it? It was related to the fact that the inputs were split into periods, but the Keras checked whether the input size agrees with the period, right?

My general problem at hand is that I have very long multi-variate recordings (14, 15 dimensional time-series). I "slice" my time-series into arbitrary chunks of size 100, but they could be really of whatever size. Firstly, I experimented with "stateful" LSTM's, but these train very slowly, as I need to proceed in the batches of "1".

My train dataset is really of length 8,640,000 timepoints for each of the dimensions. I sliced them up into 100 chunks, so (864 000,100,14), but I am planning to increase the chunk size to 200, perhaps 1000 to capture longer-term dependencies. Increasing the chunk size is computationaly expensive for LSTMs, hence I initially stuck with 100.

My first approach was to use 6-layer deep LSTM encoder, dense, and 6 LSTM layer decoder. The results were pretty nice, but the reconstructions were never "perfect". It soon turned out that the reconstruction resembles an "averaged" input signal - this is nicely illustrated in Figure 5 of CW-RNN paper.

When I stumbled upon CW-RNN paper I got excited, because it could potentially address my problem at hand - learning latent space of multivariate time signal. In my signal I have a couple of meaningful spectral bands, so I am planning on tweaking the periods to potentially reflect these bands -> move toward more interpretable periods.

Truly, I need to dig deeper into the CW-RNN implementation to understand it fully, but it was really nice to stumble upon your implementation on github.

Thanks, Robert.

flandolfi commented 5 years ago

Hi,

I didn't expect you are going to reply so quickly, not to mention fixing the issue. Could you explain how did you manage to fix it? It was related to the fact that the inputs were split into periods, but the Keras checked whether the input size agrees with the period, right?

Yes. The real problem were the Lambda layers: it seems they could not properly compute the output shape beforehand, returning None as a dimension on the time axis. Then the Concatenate layer could not concatenate mixed dimension of integers and NoneType.

Regarding your problem,

Firstly, I experimented with "stateful" LSTM's, but these train very slowly, as I need to proceed in the batches of "1".

Batches of 1 because of the different time lengths? We can deal with them in two, non-mutual-exclusive ways:

My train dataset is really of length 8,640,000 timepoints for each of the dimensions. I sliced them up into 100 chunks, so (864 000,100,14), but I am planning to increase the chunk size to 200, perhaps 1000 to capture longer-term dependencies. Increasing the chunk size is computationaly expensive for LSTMs, hence I initially stuck with 100.

I have done something similar before. I have noticed that training becomes faster and faster as you transform your data from "tall and thin" to "short and fat", i.e., summarize a chunk into some features (which may be produced by non-parametric function as max, mean, min, std, and so on, or in a parameterized way, using Conv1D with stride). If you have some kind of knowledge in the data you are dealing with, using feature engineering can be a time-saver approach.

My first approach was to use 6-layer deep LSTM encoder, dense, and 6 LSTM layer decoder. The results were pretty nice, but the reconstructions were never "perfect".

Have a look also at IndRNNs, they seem promising. I have developed a Keras implementation of it, but you should find it also in tf.contrib. Be aware anyway that the first layer of the CW-RNN works with period = 1, so, as a plain LSTM/GRU/SimpleRNN. This means that, if the training with a plain LSTM takes much time, using a ClockworkLSTM will take even more time.

It soon turned out that the reconstruction resembles an "averaged" input signal - this is nicely illustrated in Figure 5 of CW-RNN paper.

You should also try to use sort_ascending=True in the ClockworkRNN. This will make the slower block (the ones with larger periods) use as input the output of the faster ones, as a "summarization of the previous episodes", as opposed to the real implementation of the Clockwork-RNN (as in the paper).

In my signal I have a couple of meaningful spectral bands, so I am planning on tweaking the periods to potentially reflect these bands -> move toward more interpretable periods.

You should try to exploit this property using the Fast Fourier Transform (FFT). Three possibilities come to mind:

Sorry if I got too much involved on your research. As always, you can ignore my suggestions! :)

Francesco