Uniform Linear Array is not as good as circular array?

Audio-WestlakeU / NBSS

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

MIT License

175 stars 21 forks source link

Uniform Linear Array is not as good as circular array? #4

Closed NNPanNPU closed 1 year ago

NNPanNPU commented 1 year ago

I tried the code on uniform linear array and SDR is used as loss. However, the training loss is around -7dB---8dB, which is much higher than the circular array. Are there any possible reasons? Do you have any suggestions on ULA?

quancs commented 1 year ago

Did you try SISDR loss? How many microphones in your experiments? Which RIR set is used in your experiment? And what's the speech overlap type in your dataset?

quancs commented 1 year ago

If you can provide your dataset (clean speeches & RIRs), we can try to train NBC for you

NNPanNPU commented 1 year ago

Wow, thank you! Here are my experimental settings.

(1) There are 8 microphones in the ULA, of which the inter-element space is 4cm. (2) T60 is randomly selected between 200--900 ms. (3) Room size: x is randomly selected between [2,12]m, y is from [2,10]m, z is from [3,4]m. (4) The training data is from WSJ0. Each clean speech is contaminated by an interference (SIRs are the same as in WSJ0-2mix) and white noise (with SNR of 20dB--40dB). The interference is at least 30 degrees away from the clean speech.

I tried SISDR loss, which is unfortunately around -10dB. Do you have any experience on the situation? Is the problem related to the interference's position or the reverberation time?

Thanks again!

quancs commented 1 year ago

at least 30 degree is enough to distinguish different speakers in our experiments.
The T60 you used is also OK for NBC to separate (in our paper we use T60 selected from 0.1~1.0 s).
We didn't try denoising and separation at the time. But we have researches on narrow-band denoising and obtained good performance, so it might not be the problem with a large probability. Anyway, you can try removing the noise to figure out if it is the reason.
ULA might be the reason. Replacing the ULA with circurlar array can figure out.
and other reasons ... You can seed me your scripts for generating RIRs and scripts for generating datasets or noises. I can then verify if there are any other reasons.

quancs commented 1 year ago

Oh, one more question is what's the speech overlap way in your experiments, both train and test. The overlap ways need to be the same for training and testing, or the overlap ways used at training should roughly include the overlap ways used at testing.

NNPanNPU commented 1 year ago

Thanks for your answer and patience. I tried the code on circular array and half spherical array. Both of them work well. I guess ULA might be the reason. Though, very weird. BTW, what do you mean by "overlap ways"?

quancs commented 1 year ago

Great. We didn't know that the NBC doesn't perform well on ULA before. Thank you for finding that. Our upcoming work might not have this problem. NBC doesn't perform well on ULA might because ULA doesn't provide spatial information as much as circular array and half spherical array, that is fatal for the narrow-band method which fully relies on the spatial information to separate.

quancs commented 1 year ago

Below is the four overlap ways of one speech pair we considered in our code.

quancs commented 1 year ago

I'm closing this issue now. If you have more questions, you are welcomed to reopen it. Thank you for paying your attention to our work. ^_^

quancs commented 1 year ago

@NNPanNPU Hello, we have revised NBC (NBC2) in this repo. It works well even with two microphones, so it should be OK with ULA.

NNPanNPU commented 1 year ago

@NNPanNPU Hello, we have revised NBC (NBC2) in this repo. It works well even with two microphones, so it should be OK with ULA.

Thanks! This is a really nice work.