Closed nlml closed 1 year ago
Most likely its just the arch of GAN v diffusion.
Yeah, I'd say that's the main difference. Diffusion models have more "natural" artifacts in the high end in my opinion, since they tend to match the general noise profile pretty well
Hi there
Cool project! Was just watching your interview with weights and biases.
I tried to do a similar thing with StyleGAN back in 2018 or so. Basically changed 2D to 1D everywhere and that was about it. Trained on raw waveforms of around 10,000 kick drum samples at I believe 44.1khz.
The results sounded pretty good, but I was always getting these high frequency artifacts. Sounded like a very light bitcrusher effect. I always thought it was some by-product of the convolution and upsampling layers. It seems like your results don't have this problem. I wonder if you encountered anything like this (or any other problems) and how you might have overcome them?
Would be great if you write a blog post or paper actually!
Cheers, Liam