chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.33k stars 281 forks source link

Lenght of recordings in the case of the birds dataset #67

Open spagliarini opened 4 years ago

spagliarini commented 4 years ago

Hi,

For the dataset on birds, I read that the total length of the dataset is 12.2 hours but I'm interested in the characteristics of each single recording used for the training. Do they have a duration of 1 second as the speech dataset? Or are they differently pre-processed?

Thank you for the availability!

chrisdonahue commented 4 years ago

The raw recordings I used were up to several minutes long. I used a spectral energy heuristic to extract a few 1.5 second chunks from each raw recording which had the most energy. I used these as the training examples (with the random shuffle flag enabled so that there would be a bit of phase jitter each time they were presented to the GAN)

spagliarini commented 4 years ago

Until now, I have tried to train WaveGAN using two different datasets:

chrisdonahue commented 4 years ago

I have not tried training WaveGAN on slices shorter than 16384 samples. For me, one second clips were already verging on unsatisfyingly short in length. Good to hear that it results in more stable training however!