CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.89k stars 1.54k forks source link

Text-to-image LDM training loss stucked at around 1.00 #135

Open xiankgx opened 2 years ago

xiankgx commented 2 years ago

Hi, I took a first stage KL-regularized autoencoder from one of the pretained models (models/first_stage_models/kl-f8/model.ckpt) and tried to train a LDM model. Training proceeds, but the loss ((train/loss_simple_step)) hardly budge. From the image, I'm seeing good quality reconstructions (due to the pretrained autoencoder). However, the samples are just noise, something you would expect if you randomize the latent code in VQGAN.

Allencheng97 commented 2 years ago

+1 same problem here.

Allencheng97 commented 2 years ago

Try to decrease your batch size, too big batch size may cause this problem and if it is too big it will be nan. Reduce the batch size to 1 4 8 works for me.

choucaicai commented 1 year ago

+1 same problem here, do you solve it

Neesky commented 2 months ago

reducing the learning rate is working for me