Training on Colab - CUDA out of memory

dome272 / MaskGIT-pytorch

Pytorch implementation of MaskGIT: Masked Generative Image Transformer (https://arxiv.org/pdf/2202.04200.pdf)

MIT License

398 stars 34 forks source link

Training on Colab - CUDA out of memory #17

Open Mancio98 opened 9 months ago

Mancio98 commented 9 months ago

Hi, I would like to ask if someone have tried to train the model on Colab. Yesterday I tried to launch training with GPU, but it runs out of memory. It instantly fills almost all 15Gbs. I tried with smaller batches (6 and 8) but same problem. Also I replaced the model inside the training of VQGAN , with the same used as inference to the transformer (vq_f16).

Additionally, If @dome272 could upload pretrained weights for both model I would be grateful (I need for my exam project at uni help ahaha)

Many Thanks

Kami-code commented 9 months ago

Hi. I am also facing the same problem. Have you solved it now? I am going to try the pre-trained models provided in the https://github.com/CompVis/taming-transformers/tree/master.

Mancio98 commented 9 months ago

Hi, unfortunately not. A last thing I would like to try is gradient accumulation but I don't think it will solve the problem. By the way I was looking to do the same as you. If I succeed I can post my solution here. You can find also here a pretrained VQGAN and also MaskGIT: https://huggingface.co/llvictorll/Maskgit-pytorch/tree/main Their GitHub: https://github.com/valeoai/Maskgit-pytorch. They modified some parameters and other few things of transformer, so I suggest to use only their VQGAN if you want to follow the original implementation of MaskGIT.