CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2012.09841
MIT License
5.82k stars 1.15k forks source link

about loss of training stage2 transformers #212

Open CrossLee1 opened 1 year ago

CrossLee1 commented 1 year ago

Dear authors and everyone

I'm trying to reproduce the training of stage2 transformers, but got a very high loss, about 3~4 ..

could anyone provide your training curves on stage2 for reference ?

Thanks a lot ~

nicolasfischoeder commented 1 year ago

I am also interested in that, as I encounter the same phenomenon. Is that normal?

nicolasfischoeder commented 1 year ago

@CrossLee1 have you figured it out by now maybe?

@pesser @rromb Can you maybe comment on that? :)

order-a-lemonade commented 1 year ago

my loss is even higher, about 6 (train on ffhq256). and the loss cant go down. Have you figured it out? image

robertchen245 commented 11 months ago

Loss lower than 5 is good enough to generate reasonable images. Mine is 4.6 on CelebAHQ. Have you tried to sample some images?

order-a-lemonade commented 11 months ago

image i have forgotten what's the problem here(seems like that i edited the source code and get some bugs), but my training loss is normal now.

robertchen245 commented 11 months ago

image i have forgotten what's the problem here(seems like that i edited the source code and get some bugs), but my training loss is normal now.

You mentioned that you modify the lr to 4.5e-6. I wonder how many epoch and the batch_size. Though my sampled images looks okay, I want it much smaller 😂

order-a-lemonade commented 11 months ago

image i have forgotten what's the problem here(seems like that i edited the source code and get some bugs), but my training loss is normal now.

You mentioned that you modify the lr to 4.5e-6. I wonder how many epoch and the batch_size. Though my sampled images looks okay, I want it much smaller 😂

I am not sure where did i mentioned about modifying the lr🤣. But my batch_size is 1. btw, i don't think longer training step is helpful for sample quality, because the generated images of my checkpoint in 400k step are not obviously better than the generated images of the checkpoint provided by authors which is in 13750 step.

lxxie298 commented 6 months ago

image i have forgotten what's the problem here(seems like that i edited the source code and get some bugs), but my training loss is normal now.

I came across your comment about resolving the issue with the loss not decreasing. It's impressive that you managed to figure it out! I'm currently facing a similar problem and would be incredibly grateful for any guidance you could provide. If it's not too much trouble, could you please share the changes you made or perhaps your training configuration file?😂 @order-a-lemonade