Harmonai-org / sample-generator

Tools to train a generative model on arbitrary audio samples
MIT License
1.08k stars 175 forks source link

Curious about issues encountered during training #9

Closed nlml closed 1 year ago

nlml commented 1 year ago

Hi there

Cool project! Was just watching your interview with weights and biases.

I tried to do a similar thing with StyleGAN back in 2018 or so. Basically changed 2D to 1D everywhere and that was about it. Trained on raw waveforms of around 10,000 kick drum samples at I believe 44.1khz.

The results sounded pretty good, but I was always getting these high frequency artifacts. Sounded like a very light bitcrusher effect. I always thought it was some by-product of the convolution and upsampling layers. It seems like your results don't have this problem. I wonder if you encountered anything like this (or any other problems) and how you might have overcome them?

Would be great if you write a blog post or paper actually!

Cheers, Liam

twobob commented 1 year ago

Most likely its just the arch of GAN v diffusion.

zqevans commented 1 year ago

Yeah, I'd say that's the main difference. Diffusion models have more "natural" artifacts in the high end in my opinion, since they tend to match the general noise profile pretty well