facebookresearch / WavAugment

A library for speech data augmentation in time-domain
MIT License
640 stars 57 forks source link

Addictive Nosie(with MUSAN) #19

Open JoungheeKim opened 3 years ago

JoungheeKim commented 3 years ago

What a wonderful API. Thanks for giving us a good tools to augment waveform.

I just wondered how to add MUSAN addictive noise In a paper(WavAugment), they said that addictive noise with MUSAN augment is beneficial. And i found some clues that this API consider MUSAN noise in Jupyer example.

Can you give me some exact example to apply MUSAN nosie with [80, 240] Hz range which is mentioned in paper?

JoungheeKim commented 3 years ago

Can anyone give me some examples to add MUSAN addictive noise in waveform?

v-nhandt21 commented 2 years ago

Still looking for fully implement with MUSAN, I just wonder if the EffectChain handle with the difference in length between noise and audio input

image

v-nhandt21 commented 2 years ago

My implement in process_file.py for Random Add Noise MUSAN

class RandomNoise:

     def __init__(self, x_size):
          self.noise_list = glob.glob("MUSAN/*/*.wav")
          self.x_size = x_size

     @staticmethod
     def noise_generator(x_size, noise_path):

          noise = torchaudio.load(noise_path)[0]
          mask = torch.zeros_like(x)
          if noise.size(1) >= x_size:
               start_index = np.random.randint(0, noise.size(1) - x_size)
               end_index = start_index + x_size
               mask[0] = noise[0][start_index:end_index]
          else:
               start_index = np.random.randint(0, x_size-noise.size(1))
               end_index = start_index + np.random.randint( int(noise.size(1)*3/4), noise.size(1))
               mask[0][start_index:end_index] = noise[0][0:end_index-start_index]
          return mask

     def __call__(self):

          noise_path = np.random.choice(self.noise_list)
          snr = np.random.triangular(5, 7, 20)

          return self.noise_generator, snr, self.x_size, noise_path
if effect['noise']:
          noise_generator, snr, x_size, noise_path = RandomNoise(x_size)()
          chain = chain.additive_noise(noise_generator, snr, x_size, noise_path)

In WavAugment/augment/effects.py, I hardcode like this

def additive_noise(self, noise_generator: Callable, snr: float, x_size, noise_path):
        self._chain.append(AdditiveNoise(
            noise_generator=noise_generator, snr=snr, x_size=x_size, noise_path=noise_path))
        return self
class AdditiveNoise:
    def __init__(self, noise_generator: Callable, snr: float, x_size, noise_path):
        self.noise_generator = noise_generator
        self.x_size = x_size
        self.noise_path = noise_path
        self.snr = snr

        r = np.exp(snr * np.log(10) / 10)
        self.coeff = r / (1.0 + r)

    def __call__(self, x, src_info, dst_info):
        noise_instance = self.noise_generator(self.x_size, self.noise_path)
        assert noise_instance.numel() == x.numel(
        ), 'Noise and signal shapes are incompatible'

        noised = self.coeff * x + (1.0 - self.coeff) * noise_instance.view_as(x)
        return noised, src_info, dst_info