Curious about issues encountered during training

nlml commented 1 year ago

Hi there

Cool project! Was just watching your interview with weights and biases.

I tried to do a similar thing with StyleGAN back in 2018 or so. Basically changed 2D to 1D everywhere and that was about it. Trained on raw waveforms of around 10,000 kick drum samples at I believe 44.1khz.

The results sounded pretty good, but I was always getting these high frequency artifacts. Sounded like a very light bitcrusher effect. I always thought it was some by-product of the convolution and upsampling layers. It seems like your results don't have this problem. I wonder if you encountered anything like this (or any other problems) and how you might have overcome them?

Would be great if you write a blog post or paper actually!

Cheers, Liam

twobob commented 1 year ago

Most likely its just the arch of GAN v diffusion.

zqevans commented 1 year ago

Yeah, I'd say that's the main difference. Diffusion models have more "natural" artifacts in the high end in my opinion, since they tend to match the general noise profile pretty well

Harmonai-org / sample-generator

Curious about issues encountered during training #9