aishoot / LSTM_PIT_Speech_Separation

Two-talker Speech Separation with LSTM/BLSTM by Permutation Invariant Training method.
306 stars 90 forks source link

Preprocessing of Dataset to feed into LSTM #11

Closed divyeshrajpura4114 closed 5 years ago

divyeshrajpura4114 commented 5 years ago

Can you please explain procedure or different steps to preprocess data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep Attractor Network", but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because in difference of no of frames is large.