chrisdonahue / wavegan

WaveGAN: Learn to synthesize raw audio with generative adversarial networks
MIT License
1.33k stars 280 forks source link

Custom Drum Datasets #91

Open thenapking opened 4 years ago

thenapking commented 4 years ago

I assembled a dataset of approximately 9,000 bass drum samples from my own recordings, of about 250mb in total. This is much larger than the drum training set you used, however less varied because it's just one type of drum. The samples are really very short - most of the sounds last 250ms. So I trained using the following options for around 20,000 steps: --data_first_slice --data_pad_end --data_fast_wav --wavegan_dim 32 --data_num_channels 1 --data_sample_rate=22050 After 5-10,000 steps the output was good. The sounds were recognisably bass drums, but had too much random HF noise. Unfort by 20,000 steps the output was mainly noise, and had lost the characteristics, and I stopped training as this had taken roughly 24 hours over a couple of days on a google collab page. I tried adding the options: --data_slice_len=16384 --wavegan_batchnorm --data_normalize However this made the situation even worse (although it was much quicker). I've considered rewriting your code to allow a data_slice_len of 8192, which would be suitable for my dataset. However I am concerned that the dataset itself is the problem given how good your results were. Unfort this isn't so much an issue, but a request for advice, from @chrisdonahue and others who have used this project. I see that others have raised issues with small datasets #77 , high frequency components #88 etc. I'm a noob with this and don't as yet understand how to interpret the scores and graphs it outputs, but when I do I will see if I can add more detail to this issue.

chrisdonahue commented 4 years ago

Hey there. Sorry you're not getting the results you want. It's possible that --wavegan_dim 32 is the culprit; this will result in a model with far fewer parameters than the models we trained in the paper. Is there a particular reason you chose to reduce the size of the model?

thenapking commented 4 years ago

Hi @chrisdonahue, Thanks for your reply! There wasn't a particular reason for specifying 32 dimensions. I've since tried with the following options: --data_first_slice --data_pad_end --data_fast_wav --wavegan_genr_pp --data_sample_rate=22050 --data_slice_len=16384 --data_normalize. I found batch normalisation was making the situation worse, but I still don't get good results audibly. By 3000 steps I actually had some fairly good results with good HF definition however by 20k these had all disappeared and the situation was getting worse. Inception score was around 1 - but the loss functions looked weird when I graphed them. I will try with --wavegan_disc_phaseshuffle 2 though. I guess my real question is: how do I assess the dataset? There's more variety in your set, as it includes sounds from multiple drums. How important is that variety? Or is it the size of the set that is important, esp if there is less variety? Dloss