chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.32k stars 283 forks source link

G_loss and D_loss #64

Open spagliarini opened 4 years ago

spagliarini commented 4 years ago

Hi Chris,

I'm training the wavegan using a dataset of birds data (in particular on single syllables data extracted from the recordings - meaning a lot of very short audio files).

I'm trying to look at the loss values and compare mine with the ones you provided as examples in #56. I obtain something really different and weird: after a certain number of steps (depending on how many data I consider this number varies, as many data I have as this number increases) the loss just reaches values of order e+3 and oscillates between negative and postivie values. This happens both for G_loss and D_loss. In any case before this happens I can't see a situation analogue to the one in the eample.

Any idea why this is happening? Could this be due to the amount of data I am using (too small data set?) or to some parameter I should change since the dataset is composed by short audio files? Or "overtraining"?

Thanks for the availability!

chrisdonahue commented 4 years ago

I've not seen this particular failure mode actually. It likely has to do with the size of the dataset. What does your data look like (e.g. how many files, how long are each)? I can possibly give some suggestions on how to configure the data parameters.

spagliarini commented 4 years ago

I have single syllable's recordings, each recording has a duration of 250ms, but the syllable itself could be shorter (so in average could be 150ms of sound, at the beginning of the recording).

I have up to 70000 syllables which in any case in terms of hours does not make a lot (not as the ~12 hours you used in the paper). Indeed, counting an average duration 150ms for each syllable, and using all the recordings it makes ~3 hours.

I just noticed in #63 that you also obtained good results after ~20k steps so I could stop the training even before with a smaller dataset, and maybe this is one contribution to the weird dynamics of the loss.

chrisdonahue commented 4 years ago

What's the sample rate? Make sure you're using --data_pad_end and --data_first_slice otherwise you might be ignoring a lot of your training data.

spagliarini commented 4 years ago

Thank you. The sample rate is 16000 Hz. I was already using --data_first_slice, but not --data_pad_end. I'm trying now with this option set on True. I let you know if something changes in the loss.

chrisdonahue commented 4 years ago

Hopefully --data_pad_end improves things. Without this flag, the data loader drops any audio file that is smaller than the window length (which is probably the case for most of your audio files). I should probably set this to true by default...

spagliarini commented 4 years ago

I can see improvements using this option but still not comparable with results I've seen in #56 (from your sharing). Now I still observe the same dynamics in the loss, but without reaching the order e+3.

I am trying now to play with the value of lambda since I read that this hyperparameter could influence the performance of the model. Did you try to vary it in your experiments or not?

Also, I noticed that in the examples you provided the value of the loss ends up being mostly the same for all the dataset. Any meaning for this value?

chrisdonahue commented 4 years ago

I have never tried varying the lambda value; I kept it fixed as in the original WGAN-GP paper. It's definitely possible it could affect things.

I wish I had a more satisfying answer but your guess is as good as mine surrounding the loss value which the experiments seem to converge to :/

spagliarini commented 4 years ago

I tried to vary lambda because I read a paper about regularization and lambda is mentioned as parameter that affect in this sense (of regularization).

For my problem of having the loss with values very different from the ones obtained in your examples, varying the value of lambda does reduce the oscillation (keeping the losses' values in a smaller range) but does not solve the convergence or the dynamics of the losses.