Overfitting problem when training transformer

CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis

https://arxiv.org/abs/2012.09841

MIT License

5.73k stars 1.14k forks source link

Overfitting problem when training transformer #48

Closed fnzhan closed 3 years ago

fnzhan commented 3 years ago

I train the transformer but find it overfits after 30-40 epochs, with the validation loss goes high and the training loss is very small. If you meet this problem in model training. Now I try to use the pkeep=0.9 in the cond_transformer.py to avoid overfitting.

ink1 commented 3 years ago

Does it help? And how big is your dataset? Mine is 70k and I see validation loss lagging more and more behind from less than 10 epochs.

ink1 commented 3 years ago

I initially thought that the 256x256 training images are randomly cropped from the dataset you provide. But that turned out to be not the case and I had to provide a set of 256x256 crops myself. I see no reason why random cropping should not work (other than slightly higher overhead) and it may help with overfitting but I did not get to the bottom of this yet.

fnzhan commented 3 years ago

For my part, overfitting is caused by the removal of random crop. After adding the random crop in model training, the validation loss goes well.

adeptflax commented 3 years ago

Could adding random crop to the training data improve my image2image model? https://github.com/CompVis/taming-transformers/issues/51 I based the code off of the imagenet code.

adeptflax commented 3 years ago

I have around 11k training examples.

adeptflax commented 3 years ago

Would there a way to do pre-training?

LeeDoYup commented 2 years ago

I also think that the stage 2 training of VQ-GAN would suffer from the overfitting on FFHQ, because this repository does not contain data augmentations for FFHQ training dataset.