Closed cavus700 closed 5 years ago
Check out how the LSTM layer is initialized in https://github.com/HarryVolek/PyTorch_Speaker_Verification/blob/master/speech_embedder_net.py. Specifically, line 19 batch_first=True
. The PyTorch LSTM layer allows you to specify that the shape of your input contains batches in the first dimension e.g. (batch, seq_len, input_size). I believe it's largely preference which determines what to use.
Thank you, I overlooked this initialization parameter.
I am not 100% sure but as I understand the input for the lstm layer should be in shape (seq_len, batch, input_size) according to the documentation. The input for the SpeechEmbedder is in the shape of (batch, seq_len, input_size). Is this a mistake or do I understand something wrong?