Open Sonish-Maharjan-2014 opened 6 months ago
Hello, thank you for your insterests in our works. To train the model for a dataset where each utterance more than 2 speakers, you can change the number of output channels to 2N (N speakers and each has 2 numbers for the real and imginary parts of STFT coefficients) for each TF-bin.
Thank you for your response.. I tried adapting the code for four speakers. I generated room impulse responses (RIR) for the four speakers and made some adjustments in the code. Unfortunately, I ran into an error towards the end of the process.
Could you help me fix the problem?
You can debug your code to check the shape of echoics
, echoic_i
, and the value of needed_lens
Thanks, I will try to debug
I trained the model (form NBSS) branch for 2 speakers separation using wsj0 dataset. It perfectly worked. But now I want to train the model for more than 2 speakers. What steps should I follow?