Open adrienchaton opened 5 years ago
I work on a raw audio waveform model, mainly using 1d (temporal) convolutions. After the last convolution of the decoder, I add up a single channel convolution with a large kernel (eg. width of 513 with padding for keeping the same output size) and stride/dilation = 1.
I am not sure if that is similar to what you mention as a learnable post-processing filter. In that case it would be some sort of FIR filter ? Or did you implement something more specific than an additional convolution layer ? If yes, I would be thankful for some details about your method !
Best
Hi there. The learnable filter is disabled by default but can be enabled by using the command line arg --wavegan_genr_pp
on the training script. See the relevant references in the training script to see how we implemented this.
After the last convolution of the decoder, I add up a single channel convolution with a large kernel (eg. width of 513 with padding for keeping the same output size) and stride/dilation = 1.
This sounds identical to what I am doing
In that case it would be some sort of FIR filter ?
Yes, it is a learned stationary FIR filter with many taps
Thanks Chris for your answer and confirming this ! It seems also beneficial for my application
Good continuation, it was a really good article to my opinion
Hello Chris,
I read your paper that I liked much for its in-depth analysis and the clarity of the discussions. The idea of using a learnable filter to adaptively remove artifacts is interesting, however I cannot find it in the WaveGAN codes.
Did I miss it ? Or would you have a code example for this part please ?
Thanks !