chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.32k stars 283 forks source link

dimension of latent vector #65

Open spagliarini opened 4 years ago

spagliarini commented 4 years ago

Hi again,

I have also a curiosity related to the dimension of the latent vector, which I see it is 100 in the provided code. If this makes sense, did you try to change it and see what happens? For example did you try to see if there is a lower bound for the dimension such that the wavegan still can be trained and gives good results?

Thanks and looking forward for discussing this point!

chrisdonahue commented 4 years ago

The number is quite arbitary. We chose the number 100 (and indeed all of our hyperparameter decisions) to align with those of the DCGAN work.

I have trained WaveGANs with lower dimensionality (e.g. 25, 50) and it doesn't seem to affect things. I imagine if you reduce it to an extreme value (e.g. 1 or 2), weird things might start happening. Not sure though! Let me know what happens if you run this

spagliarini commented 4 years ago

I tried to run the training using 100, 50, 25, 12, 6 and 3 as latent space dimension using a dataset of canary songs. I obtained reasonable preliminary results up to 12 and 6, but if I use 3 the produced sound is noisy and "electric". I say preliminary because I did check the preview but I did not evaluate the generation (calculating the inception score or in other ways).

spagliarini commented 3 years ago

Hi again, I was thinking about the possibility of having a low dimensional latent space. Indeed, I successfully trained a low dimensional WaveGAN on my canary dataset. I was reasoning around some questions. For instance, does a low dimensional (e.g. 6 or 3) latent space reduces the computational cost of GAN training to have a low-dimension? If so, it might be convenient to have such a low dimensional latent space in order to make easier the exploration of the latent space.

chrisdonahue commented 3 years ago

Unfortunately, I think the computational savings of a low-dimensional latent space are marginal. This only saves you parameters at the first layer of the generator, i.e., for a single matrix multiply. I agree that a low-dimensional latent space will certainly be more user-exploration-friendly though!