chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.31k stars 282 forks source link

Recommended number of training steps to achieve example results #63

Open moih opened 4 years ago

moih commented 4 years ago

Hi!

I'm really excited about this implementation of generative audio and have just started training on my gaming GTX 1060 laptop.

My focus is to generate different waveGANs using a dataset out of different folk musics from Africa and Latin America.

I'm really interested in the applications also for live music, as I will be trying to generate sounds in offline (maybe also real time?) for a spatial piece and would like to know if I should use a cloud service to generate my different WaveGAN checkpoints.

Around what is the estimated training step at which the example models started generating the posted results?

Thanks and looking forward how this develops!

Update: I'm at roughly step 20k and the results are coming through quite nicely! Will post some audio soon

chrisdonahue commented 4 years ago

Thanks for your interest and sorry for the delay! I would love to hear some sound examples when you have the time.

You can probably run the generation pretty quickly on a regular laptop; you shouldn't need to use a cloud service unless your application requires generating a ton of content. You can get an idea of how fast the model would run on your laptop by going to this web demo and pressing "Change" on one of the sounds: https://chrisdonahue.com/wavegan/

One thing you can do is to take the trained WaveGAN and just generate a ton of sounds from it offline (e.g. 100k). Then you can just take a random sample from set in your real-time application.

We trained the models for the posted examples to between 100k and 200k steps, but yes we observed that even after only 10-20k steps the model was producing reasonable results.

Best of luck and let me know if you have more questions!

moih commented 4 years ago

Thanks for your reply @chrisdonahue .

Here are some of the results from my training experiment from waveGAN.

Here is the dataset, for reference: https://www.youtube.com/watch?v=wXV39pybgJU

And here are my results (stopped at checkpoint 30k or so): https://www.dropbox.com/sh/hp4jk6d7gzuy2qz/AAAfgWpGnuh30LI7EwEAfa8ya?dl=0

IMO, they are pretty good quality and useful as samples for further processing in an electronic music production workflow.

Currently I'm experimenting with mixing heterogeneous datasets, meaning that I use very different .wavs as datasets and see if the model actually "mixes them" in the output as it learns from the them.

Another Issue I'm having is to actually be able to load in my own checkpoints with the example code that is published in the Jupyter notebook.

Currently I am generating each sample one by one using a generator script someone else posted in another issue thread, which is kind of tedious.

Thanks again!

UPDATE: I managed to make a .py script for interpolating between latent space vectors, as in the Google Colab Notebook. Here is one interpolation using the above generated checkpoints: https://soundcloud.com/h-e-x-o-r-c-i-s-m-o-s/espacio_latente?fbclid=IwAR239cENr7yFQQq7Xi8CaOar8_H1k2_yHi7pOiwSQ5QYrM_iGrXdVwMyo-k