Stability-AI / stable-audio-tools

Generative models for conditional audio generation
MIT License
2.73k stars 258 forks source link

loss nan in VAE training #144

Open shwj114514 opened 2 months ago

shwj114514 commented 2 months ago

Thank you for your excellent work and the well-designed open-source code.

When I use your training code to train from scratch, I frequently encounter a situation where the loss becomes NaN after a certain number of training steps. Is this behavior expected?

image

This issue occurs when training both 44100 mono and stereo audio files. I have to repeat the training multiple times to ensure the loss remains stable.

image

I am using the stable audio 2.0 config.

apply74 commented 2 months ago

I also encountered this problem. When I increased the model parameters, the training was unstable. Is it going to be solved?

shwj114514 commented 2 months ago

I also encountered this problem. When I increased the model parameters, the training was unstable. Is it going to be solved?

I solved this problem by reducing the learning rates of both the generator and discriminator to 1/10 of their original values, and the training became stable.

apply74 commented 2 months ago

I also tried reducing the learning rate. Although the training is stable, the reconstruction result will be very poor.

fletcherist commented 1 month ago
Screenshot 2024-09-26 at 11 41 49

the same thing

apply74 commented 1 month ago
Screenshot 2024-09-26 at 11 41 49

the same thing I have solved the problem by increating the batch_size from 1 to 5.

fletcherist commented 1 month ago
Screenshot 2024-09-26 at 11 41 49

the same thing

I have solved the problem by increating the batch_size from 1 to 5. @apply74 oh rly? let me try it but i think this batch size doesn't fit to gpu)) i'll message here after a try. thanks for your help very appreciate it

fletcherist commented 1 month ago

reducing the learning rates of both the generator and discriminator to 1/10 of their original values

this works

nateraw commented 1 month ago

You have to tune the learning rates. Higher batch size helps keep things stable.

Another tip is if you can't get large enough batch size, you can reduce the sample size which should free up enough memory to bump back up the batch size.

Hope this helps ❤️

stg1205 commented 4 weeks ago

Also I noticed that doing vad to remove silence part help