Open svs11 opened 3 months ago
Hi!
I had a look at your model, and just printing the trainable weights for the first LSTM layer, I see
[<tf.Variable 'lstm/lstm_cell/kernel:0' shape=(32, 32) dtype=float32, numpy=
array([[ 0.01692209, -0.22750317, -0.17461008, ..., -0.1579345 ,
-0.21596879, -0.18585742],
[-0.00184596, 0.1575419 , 0.20252898, ..., 0.14971459,
0.17116585, -0.04111549],
[-0.24567568, 0.01723912, -0.15928173, ..., -0.20553797,
0.22376046, -0.24837291],
...,
[-0.20332155, 0.06006312, 0.06557494, ..., 0.10808015,
-0.2113491 , -0.05491558],
[-0.27010858, 0.10658553, -0.13689941, ..., 0.2040728 ,
-0.14297459, 0.2779071 ],
[-0.29793295, -0.13058276, 0.01223576, ..., -0.02761602,
-0.27836597, 0.1290856 ]], dtype=float32)>,
for the kernel weights, which are of size 32 x 32 = 1024. So hls4ml is correctly inferring the size of the weight tensor. I think there is a misunderstanding of the expected size here, the number of samples does not impact the size of the weight tensors, see for example https://medium.com/analytics-vidhya/demystifying-lstm-weights-and-biases-dimensions-c47dbd39b30a.
Thank you for your reply!
After reading the page linked in your response, I see that my terminology might be incorrect. I'm assuming--or rather I desire--that the CNN+pooling layer is providing a "sequence length" of 1 and an embedded dimension of samples * channels. We're building latency-constrained models and we can't afford to invoke the LSTM equations multiple times per forward pass. In other words, we want to flatten the tensor feeding the LSTM layer into a single embedding vector. Is this possible?
Thank you! -Suyash
Hi Suyash,
I don't think something like this is supported in hls4ml at the moment. AFAIK, our implementation keeps the structure of iterating over the time steps to calculate the results. I presume it would be possible to add an optional version that flattens the inputs (Flatten
layers are supported) and processes the full calculation in one go. People that are more expert on the implementation on LSTM in hls4ml can correct me, but I think this would require some development.
I’m afraid that hls4ml is not properly flattening the tensor between a conv1d layer and an LSTM layer. For the network generated from the following Keras code:
import tensorflow as tf from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Conv1D, MaxPooling1D, LSTM, Flatten, Dense,TimeDistributed def create_model(): model = Sequential() model.add(Conv1D(8, 3, padding='same', activation='relu', input_shape=(16, 1))) model.add(MaxPooling1D(pool_size=2, strides=2, padding='same')) model.add(Conv1D(16, 3, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2, strides=2, padding='same')) model.add(Conv1D(32, 3, padding='same', activation='relu')) model.add(MaxPooling1D(pool_size=2, strides=2, padding='same'))
Create the model
model = create_model()
I see the following generated CPP code:
include
include "network_64_4_64_2_32_2_32_ru.h"
include "parameters.h"
void network_64_4_64_2_32_2_32_ru( hls::stream &conv1d_input,
hls::stream &layer15_out
) {
ifndef SYNTHESIS
endif
}
The number of weights on the first LSTM layer is expected to be the number of outputs from the last conv1d+pooling layer, which is 2 samples 32 channels x 4 gates 8 states = 2048, but is instead shown in the generated CPP as 1024.
How should I be connecting a pooling layer to an LSTM layer to guarantee that all outputs are conveyed?