why use the reverberated speech signal as the training target

Audio-WestlakeU / NBSS

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation

MIT License

232 stars 26 forks source link

why use the reverberated speech signal as the training target #16

Open flytair opened 1 year ago

flytair commented 1 year ago

hi, it a great amazing project, thanks for your effort. When I looked at the code, I found that the training target signal was reverberated speech. (https://github.com/Audio-WestlakeU/NBSS/blob/af66db92bb9d6f72f7100d613d3df38c40b10b09/data_loaders/ss_semi_online_dataset.py#L294C27-L294C27) I wander why not use clean speech as the training target, as it would not only separate speakers, but also remove reverberation and even noise.

quancs commented 1 year ago

pls check sms_wsj_plus.py which is the latest dataset for jointly speech separation, denoising and dereverberation. The code you referred to is old and not used in SpatialNet.

flytair commented 1 year ago

thanks for your response! i have another 2 questions regarding to the sms_wsj_plus dataset that the speech signal in this dataset is treated as babble noise source: https://github.com/Audio-WestlakeU/NBSS/blob/e988a6ec845b6153910bbd106059a50b0b2c4a09/data_loaders/sms_wsj_plus.py#L95C9-L95C115 self.noises = list(set(original_sources)) # take the speech signal in this dataset as babble noise source

as the babble noise is speech and the targets of the network is also speech, how can the model know which source are the targets, the babble or the other speech?
as the babble noise is directional source, does the babble noise need to be convoluted with the rirs?

thanks!

quancs commented 1 year ago

@flytair

as the babble noise is speech and the targets of the network is also speech, how can the model know which source are the targets, the babble or the other speech?

The babble noise is diffuse, while the target speech signals are directional, that is the key clue for the model to learn to distinguish them.

as the babble noise is directional source, does the babble noise need to be convoluted with the rirs?

The babble noise is diffuse not directional, so it doesn't need to be convolved with rirs. And we use the method implemented in https://github.com/Audio-WestlakeU/NBSS/blob/main/data_loaders/utils/diffuse_noise.py to make it diffuse.

flytair commented 1 year ago

thanks for your response! do you think it is reasonable to use wham noise as babble noise in sms_wsj_plus dataset?