aishoot / LSTM_PIT_Speech_Separation

Two-talker Speech Separation with LSTM/BLSTM by Permutation Invariant Training method.
306 stars 90 forks source link

Preprocessing of Dataset to feed into LSTM #12

Open divyeshrajpura4114 opened 5 years ago

divyeshrajpura4114 commented 5 years ago

Can you please explain procedure or different steps to pre-process data before feed to LSTM. I am working on paper by Zhuo Chen on "Speaker-Independent Speech Separation With Deep Attractor Network", but I am not able to create batches because each audio file have different no of frames. So how do you handle variable length input to LSTM? I know techniques like padding sequence, but I dont think that would be effective because difference of no of frames is much large.

aishoot commented 5 years ago

Hi, you can read those two files tfrecords_io.py and run_lstm.py.

divyeshrajpura4114 commented 5 years ago

Ok. I will look into that. Thank You...

nagasaibharath commented 5 years ago

If we are able to create our own mixed wav files, then is there any need for getting SNR Signals of the Audio files?