chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.32k stars 283 forks source link

Learned post-processing filters #48

Open adrienchaton opened 5 years ago

adrienchaton commented 5 years ago

Hello Chris,

I read your paper that I liked much for its in-depth analysis and the clarity of the discussions. The idea of using a learnable filter to adaptively remove artifacts is interesting, however I cannot find it in the WaveGAN codes.

Did I miss it ? Or would you have a code example for this part please ?

Thanks !

adrienchaton commented 5 years ago

I work on a raw audio waveform model, mainly using 1d (temporal) convolutions. After the last convolution of the decoder, I add up a single channel convolution with a large kernel (eg. width of 513 with padding for keeping the same output size) and stride/dilation = 1.

I am not sure if that is similar to what you mention as a learnable post-processing filter. In that case it would be some sort of FIR filter ? Or did you implement something more specific than an additional convolution layer ? If yes, I would be thankful for some details about your method !

Best

chrisdonahue commented 5 years ago

Hi there. The learnable filter is disabled by default but can be enabled by using the command line arg --wavegan_genr_pp on the training script. See the relevant references in the training script to see how we implemented this.

After the last convolution of the decoder, I add up a single channel convolution with a large kernel (eg. width of 513 with padding for keeping the same output size) and stride/dilation = 1.

This sounds identical to what I am doing

In that case it would be some sort of FIR filter ?

Yes, it is a learned stationary FIR filter with many taps

adrienchaton commented 5 years ago

Thanks Chris for your answer and confirming this ! It seems also beneficial for my application

Good continuation, it was a really good article to my opinion