chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.32k stars 282 forks source link

WaveGAN: Audio synthesis counter example where it fails for Broadband signals (Ambient noise) #88

Open hinash88 opened 4 years ago

hinash88 commented 4 years ago

I have been trying to understand the paper on WaveGAN

1- WaveGAN: A shortcoming of WaveGAN mentioned in link that it can not work for higher frequencies. I have used underwater ship engine audio data using WaveGAN which contains high frequency components as you can see in attached FFT of real Motoboat sound (Figure 1) and it is appearing in GAN generated audio (Figure 2) as well. Can you suggest any counter example of high frequencies that WaveGAN can not deal? Original Motorboat WaveGAN_motorboat

2- If the high frequency components can not be dealt with WaveGAN, how can we technically explain this problem that why can't waveGAN incorporate high frequency components?

Your help will be highly valued.

hinash88 commented 3 years ago

To validate the high frequency components problem, I have created a simple sinusoidal dataset containing single frequency (no harmonics) from 4000 Hz to 8000 Hz. Well! again to my surprise, WaveGAN is giving good results in mimicing the original data. So, I tried WaveGAN with another dataset (broadband signal), underwater ambient noise containing rain drops, water splashing, wave breaking sounds, It totally fails to converge and merely generates noise. Therefore, I reached to a conclusion that WaveGAN performs good for a narrowband signal containing tonal frequencies such as (sharp sounds, drums, bgoat, bird) but doesnot perform good for broadband signals such as ambient noise. I would want to explore the explanation that why WaveGAN (CNN based network) performs this way?

chrisdonahue commented 3 years ago

That sounds like a reasonable conclusion given your experiments. We mostly only experimented with WaveGAN on datasets with sparse, narrow-band information (e.g. bird songs).

I am not sure why it wouldn't work for wideband signals; sounds like a good research question! Maybe the discriminator has a difficult time distinguishing between the initially noisy signals produced by WaveGAN and the real data, so it's not giving good signal to the generator? Maybe the generator has the wrong inductive bias for producing wideband signals? Not clear to me :)

hinash88 commented 3 years ago

Thank you for your reply! ^.^