Audio-WestlakeU / NBSS

The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation
MIT License
175 stars 21 forks source link

why use the reverberated speech signal as the training target #16

Open flytair opened 9 months ago

flytair commented 9 months ago

hi, it a great amazing project, thanks for your effort. When I looked at the code, I found that the training target signal was reverberated speech. (https://github.com/Audio-WestlakeU/NBSS/blob/af66db92bb9d6f72f7100d613d3df38c40b10b09/data_loaders/ss_semi_online_dataset.py#L294C27-L294C27) I wander why not use clean speech as the training target, as it would not only separate speakers, but also remove reverberation and even noise.

quancs commented 9 months ago

pls check sms_wsj_plus.py which is the latest dataset for jointly speech separation, denoising and dereverberation. The code you referred to is old and not used in SpatialNet.

flytair commented 8 months ago

thanks for your response! i have another 2 questions regarding to the sms_wsj_plus dataset that the speech signal in this dataset is treated as babble noise source: https://github.com/Audio-WestlakeU/NBSS/blob/e988a6ec845b6153910bbd106059a50b0b2c4a09/data_loaders/sms_wsj_plus.py#L95C9-L95C115 self.noises = list(set(original_sources)) # take the speech signal in this dataset as babble noise source

  1. as the babble noise is speech and the targets of the network is also speech, how can the model know which source are the targets, the babble or the other speech?
  2. as the babble noise is directional source, does the babble noise need to be convoluted with the rirs?

thanks!

quancs commented 8 months ago

@flytair

  1. as the babble noise is speech and the targets of the network is also speech, how can the model know which source are the targets, the babble or the other speech?

The babble noise is diffuse, while the target speech signals are directional, that is the key clue for the model to learn to distinguish them.

  1. as the babble noise is directional source, does the babble noise need to be convoluted with the rirs?

The babble noise is diffuse not directional, so it doesn't need to be convolved with rirs. And we use the method implemented in https://github.com/Audio-WestlakeU/NBSS/blob/main/data_loaders/utils/diffuse_noise.py to make it diffuse.

flytair commented 8 months ago

thanks for your response! do you think it is reasonable to use wham noise as babble noise in sms_wsj_plus dataset?