Stability-AI / stable-audio-tools

Generative models for conditional audio generation
MIT License
2.51k stars 229 forks source link

Training from Scratch Resulting in Noise #137

Open kenkalang opened 1 month ago

kenkalang commented 1 month ago

I've been trying to train diffusion model with stable audio 1.0 config, I also trained the autoencoder with the Stable Audio 1.0 VAE for 50k steps autoencoder result which I then used as the pre-transform ckpt. However, after training the diffusion model for 100k steps, it still results in noise diffusion result. The dataset contains 2,500 hours of songs. I used LAION CLAP for CLAP conditioning. Is there anything I might have missed in my training?

cvillela commented 3 weeks ago

Also having the same issue! Any progress @kenkalang ?

kenkalang commented 3 weeks ago

Also having the same issue! Any progress @kenkalang ?

I still haven't figured this out yet, but the possible issue might be that the VAE was not loaded correctly. I'm now trying to retrain using the 2.0 configuration. The logs seem fine as they're showing pretty good progress.