HarryVolek / PyTorch_Speaker_Verification

PyTorch implementation of "Generalized End-to-End Loss for Speaker Verification" by Wan, Li et al.
BSD 3-Clause "New" or "Revised" License
575 stars 165 forks source link

lstm input wrong shape #47

Closed cavus700 closed 5 years ago

cavus700 commented 5 years ago

I am not 100% sure but as I understand the input for the lstm layer should be in shape (seq_len, batch, input_size) according to the documentation. The input for the SpeechEmbedder is in the shape of (batch, seq_len, input_size). Is this a mistake or do I understand something wrong?

seandickert commented 5 years ago

Check out how the LSTM layer is initialized in https://github.com/HarryVolek/PyTorch_Speaker_Verification/blob/master/speech_embedder_net.py. Specifically, line 19 batch_first=True. The PyTorch LSTM layer allows you to specify that the shape of your input contains batches in the first dimension e.g. (batch, seq_len, input_size). I believe it's largely preference which determines what to use.

cavus700 commented 5 years ago

Thank you, I overlooked this initialization parameter.