Closed fjiang9 closed 5 years ago
Yes actually for the data augmentation experiments (where I am onling mixing the audio sources) I needed all the files to have at least 4 secs of audio so I chopped them. In the WSJ case this is not needed. I will try to push a newer and cleaner version of this code after I am done with some deadlines that I have.
Glad that you liked the work and I would be happy to answer other questions, if you have any! :)
Section 3.2.1. refers to the WSJ case only. With no online mixing so you can use all mixtures after zero-padding.
Sorry for this confusion but I wanted to match the experiments from other works but also create the 4sec online mixing procedure described in 3.2.2 in the paper.
@etzinis Thank you so much for your kind reply! I am still running the speech separation experiment code. Looking forward to your new release : )
Because as I said before, I will not be able to put out the new release until I am finished with some urgent things. I would suggest you to just create the WSJ with the matlab script provided here: http://wham.whisper.ai/README.html and then use my script with wav_timelength=4s https://github.com/etzinis/two_step_mask_learning/blob/master/two_step_mask_learning/utils/preprocess_wsj0mix.py
I will leave this issue open in order to fix it on the new release.
So I have rechecked what you said and it seems that indeed when using the 'max' folder from the wsj2mix dataset you have the following files created: Training: 19855 Testing: 2988 Validation: 4980
and this is caused because as you said in lines 139-140 I have discarded the files with a duration lower than 4secs. However, this amount is like neglecting 0.7% on the training dataset and 0.4% on the testing and validation which I consider is negligible compared to the total size of the dataset. Moreover zeros do not contribute to any SI-SDR loss so either-way I am just making my configuration a tiny bit harder than the initial setup. If you want to just use the remaining 0.4% as well you can just zero pad in 139-140 lines.
I close this issue for now.
I have also added the code of padding now in the corresponding lines so the output distribution of samples will be: Training: 20000 Testing: 3000 Validation: 5000
Thanks for noticing that @flyjiang92 🍺 😃
@etzinis Thank you so much for your response and the code updating! 👍
Thanks for sharing the code! It's a really great work of audio source separation! I have a question about the preprocess_wsj0mix.py: As the length of some audios in wsj0-mix2 is shorter than 4 sec, after performing the codes in line 139-140, some audios are discarded after this preprocessing. The result is that there are only 17075 mixtures in the training set when using the "min" folder (this number should be 19885 when using the "max" folder). This is mismatched with the number (20000) mentioned in Section 3.2.1 of the paper. So I was wondering how many samples are finnally used in the experiment of speech separation in this paper?