chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.32k stars 282 forks source link

Last layer of the generator in the CNN (size 16384) #103

Open spagliarini opened 3 years ago

spagliarini commented 3 years ago

Hi!

I would like to clean up a doubt I have. Is the fact that the last layer of the generator has dimension 16384 related to CNN constraints or is it important that the dimension is a bit more than one would like to obtain? Here, 1 s.

Thank you in advance!!!

chrisdonahue commented 3 years ago

Ah, good question. The last layer of the generator has dimension 16384 simply because it is a power of four; each of the five layers of the generator increases the number of timesteps by a factor of four, starting from 16 (arbitrary choice).

This output length could represent any amount of time depending on sampling rate, but 16kHz is a common sampling rate in speech processing and conveniently works out to around one second of generated audio (our goal), so we went with that.