Open shwj114514 opened 2 months ago
I also encountered this problem. When I increased the model parameters, the training was unstable. Is it going to be solved?
I also encountered this problem. When I increased the model parameters, the training was unstable. Is it going to be solved?
I solved this problem by reducing the learning rates of both the generator and discriminator to 1/10 of their original values, and the training became stable.
I also tried reducing the learning rate. Although the training is stable, the reconstruction result will be very poor.
the same thing
the same thing I have solved the problem by increating the batch_size from 1 to 5.
the same thing
I have solved the problem by increating the batch_size from 1 to 5. @apply74 oh rly? let me try it but i think this batch size doesn't fit to gpu)) i'll message here after a try. thanks for your help very appreciate it
reducing the learning rates of both the generator and discriminator to 1/10 of their original values
this works
You have to tune the learning rates. Higher batch size helps keep things stable.
Another tip is if you can't get large enough batch size, you can reduce the sample size which should free up enough memory to bump back up the batch size.
Hope this helps ❤️
Also I noticed that doing vad to remove silence part help
Thank you for your excellent work and the well-designed open-source code.
When I use your training code to train from scratch, I frequently encounter a situation where the loss becomes NaN after a certain number of training steps. Is this behavior expected?
This issue occurs when training both 44100 mono and stereo audio files. I have to repeat the training multiple times to ensure the loss remains stable.
I am using the stable audio 2.0 config.