Stacking clockworn-rnn layers

Hi,

Thank you for your code implementation. I want to stack some these layers on top of each other, essentially like LSTMs in Keras, but I am experiencing some errors with the dimensionality.

for example my input is 10k samples, each has 100 timesteps and is univariate so, X.shape = (10000,100,1)

In particular, I am planning to use CW-RNN to form an autoencoder, so my ideal solution would be to have an architecture where the output for CW-RNN for example would be 32-dimensional, (encoding layer), and next I'd repeat that vector, and output my reconstruction:

model = Sequential()
model.add(ClockworkRNN(periods=[1, 2, 4, 8, 16, 32, 64, 128], 
                       units_per_period=64, 
                       input_shape=(None, 1), 
                       output_units=32, 
                       rnn_dtype='CuDNNLSTM'))

model.add(RepeatVector(64))
model.add(Reshape((32,64) ))

model.add(ClockworkRNN(periods=[1, 2, 4, 8, 16, 32, 64, 128], 
                       units_per_period=64, 
                       input_shape=(64, 1), 
                       output_units=100, 
                       rnn_dtype='CuDNNLSTM'))

model.compile(optimizer='adam', loss='mse')
model.summary()

But apparently there is something wrong with this implementation, as ClockworkRNN auto assumes the input of 'None'

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 32, 64), (None, None, 64)]

I also noticed that the 'None' in input shape is necessary: (a) runs correctly, while (b) spits error:

(a) OK model

model = Sequential()
model.add(ClockworkRNN(periods=[1, 2, 4, 8, 16, 32, 64, 128], 
                       units_per_period=64, 
                       input_shape=(None, 1), 
                       output_units=100, 
                       rnn_dtype='CuDNNLSTM'))

model.add(Reshape((100,-1)))

model.compile(optimizer='adam', loss='mse')
model.summary()

(b) ValueError code: notice change from None to 100 in input_shape

model = Sequential()
model.add(ClockworkRNN(periods=[1, 2, 4, 8, 16, 32, 64, 128], 
                       units_per_period=64, 
                       input_shape=(100, 1),  # **notice change from None to 100**
                       output_units=100, 
                       rnn_dtype='CuDNNLSTM'))

model.add(Reshape((100,-1)))

model.compile(optimizer='adam', loss='mse')
model.summary()

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 100, 1), (None, None, 64)]

Could you please investigate?

Hi Robert,

Thanks for pointing this out. I actually stumbled upon this problem a while ago, but was too lazy to fix it. Anyway, I updated the code and now it should work fine, try it out!

On the other hand, I need to warn you that this may be just one of the many problems of this implementation. A few examples:

I am not able to produce a test where CW-RNN obtains better results than the LSTM (as in the original paper);
The first Pooling layer of the CW-block, which selects the timesteps such that t % period == 0, may extract the very first step of the timeseries, instead of the period - 1-th one (I don't know if Max-AvgPooling1D works left-to-right or viceversa);
Many options of the RNN class are missing (go_backwards, stateful, etc).

Regarding your problem, if I understand correctly, you are modelling an autoencoder where the encoder creates the contextual information that will be fed into the decoder at each timestep. This can be a good approach, but you should also try to create an encoding using the hidden states of each CW-block of the encoder (no Dense layer), and feed them to the decoder as first hidden state of its CW-blocks (in a 1:1 fashion). To do this you should define a method to initialize the hidden states of the cells, which is yet to be implemented.

Moreover, you said you have X.shape = (10000, 100, 1). Since every sample does not have more than 100 steps, you should not use periods greater than 100 (I would not even use a period of 64). Also, keep in mind that if you use an LSTM layer for each CW-block, you may use larger periods for each block (an LSTM has, by definition, more "memory" than a SimpleRNN). E.g., you could use just periods=[1, 10].

These are just suggestion which were not even requested, so you can ignore them. :)

Have a nice day, Francesco

This is great!

I didn't expect you are going to reply so quickly, not to mention fixing the issue. Could you explain how did you manage to fix it? It was related to the fact that the inputs were split into periods, but the Keras checked whether the input size agrees with the period, right?

My general problem at hand is that I have very long multi-variate recordings (14, 15 dimensional time-series). I "slice" my time-series into arbitrary chunks of size 100, but they could be really of whatever size. Firstly, I experimented with "stateful" LSTM's, but these train very slowly, as I need to proceed in the batches of "1".

My train dataset is really of length 8,640,000 timepoints for each of the dimensions. I sliced them up into 100 chunks, so (864 000,100,14), but I am planning to increase the chunk size to 200, perhaps 1000 to capture longer-term dependencies. Increasing the chunk size is computationaly expensive for LSTMs, hence I initially stuck with 100.

My first approach was to use 6-layer deep LSTM encoder, dense, and 6 LSTM layer decoder. The results were pretty nice, but the reconstructions were never "perfect". It soon turned out that the reconstruction resembles an "averaged" input signal - this is nicely illustrated in Figure 5 of CW-RNN paper.

When I stumbled upon CW-RNN paper I got excited, because it could potentially address my problem at hand - learning latent space of multivariate time signal. In my signal I have a couple of meaningful spectral bands, so I am planning on tweaking the periods to potentially reflect these bands -> move toward more interpretable periods.

Truly, I need to dig deeper into the CW-RNN implementation to understand it fully, but it was really nice to stumble upon your implementation on github.

Thanks, Robert.

Hi,

I didn't expect you are going to reply so quickly, not to mention fixing the issue. Could you explain how did you manage to fix it? It was related to the fact that the inputs were split into periods, but the Keras checked whether the input size agrees with the period, right?

Yes. The real problem were the Lambda layers: it seems they could not properly compute the output shape beforehand, returning None as a dimension on the time axis. Then the Concatenate layer could not concatenate mixed dimension of integers and NoneType.

Regarding your problem,

Firstly, I experimented with "stateful" LSTM's, but these train very slowly, as I need to proceed in the batches of "1".

Batches of 1 because of the different time lengths? We can deal with them in two, non-mutual-exclusive ways:

Padding you pad the sequence to the longest one in the batch with zeros or other values, before or after each sequence. Usually you use it within a generator and train the model with fit_generator. I've done this before, to save memory and computation, I sorted the samples by their length beforehand, divided in batches and then padded with zeros. This generates batches of increasing size but way less padding.
Masking, i.e., use a special value in the timeseries as a flag to make the RNN skip a determinate step (e.g., the zeros of the padding). Alas, this method is not supported by CuDNNRNNs, and the training becomes slow.

My train dataset is really of length 8,640,000 timepoints for each of the dimensions. I sliced them up into 100 chunks, so (864 000,100,14), but I am planning to increase the chunk size to 200, perhaps 1000 to capture longer-term dependencies. Increasing the chunk size is computationaly expensive for LSTMs, hence I initially stuck with 100.

I have done something similar before. I have noticed that training becomes faster and faster as you transform your data from "tall and thin" to "short and fat", i.e., summarize a chunk into some features (which may be produced by non-parametric function as max, mean, min, std, and so on, or in a parameterized way, using Conv1D with stride). If you have some kind of knowledge in the data you are dealing with, using feature engineering can be a time-saver approach.

My first approach was to use 6-layer deep LSTM encoder, dense, and 6 LSTM layer decoder. The results were pretty nice, but the reconstructions were never "perfect".

Have a look also at IndRNNs, they seem promising. I have developed a Keras implementation of it, but you should find it also in tf.contrib. Be aware anyway that the first layer of the CW-RNN works with period = 1, so, as a plain LSTM/GRU/SimpleRNN. This means that, if the training with a plain LSTM takes much time, using a ClockworkLSTM will take even more time.

It soon turned out that the reconstruction resembles an "averaged" input signal - this is nicely illustrated in Figure 5 of CW-RNN paper.

You should also try to use sort_ascending=True in the ClockworkRNN. This will make the slower block (the ones with larger periods) use as input the output of the faster ones, as a "summarization of the previous episodes", as opposed to the real implementation of the Clockwork-RNN (as in the paper).

In my signal I have a couple of meaningful spectral bands, so I am planning on tweaking the periods to potentially reflect these bands -> move toward more interpretable periods.

You should try to exploit this property using the Fast Fourier Transform (FFT). Three possibilities come to mind:

Using the fft within a Lambda layer, computing the FFT for a whole sample and obtaining a fixed-size array of amplitudes to feed to your favourite classificator;
Using the stft instead, dividing each sample in chunks and computing its FFT, then feed the results on a RNN or a CNN+RNN;
Do as the previous point, but then using the result as an "image" of the sample, where on the x-axis you have the frequencies and on the y-axis the chunks (ordered by time), then feed it to a MobileNet/ResNet/VGG, you choose!

Sorry if I got too much involved on your research. As always, you can ignore my suggestions! :)

Francesco

flandolfi / clockwork-rnn

Stacking clockworn-rnn layers #1