dome272 / Wuerstchen

Official implementation of Würstchen: Efficient Pretraining of Text-to-Image Models
https://arxiv.org/abs/2306.00637
MIT License
527 stars 36 forks source link

More than 12Gb VRAM for training (expected) !! #10

Open axel578 opened 1 year ago

axel578 commented 1 year ago

I managed to make training on one gpu despite the training code not made for that at the beginning, full fine tuning asks for more than 12Gb of vram, which could be expected but is definetly a big drawback for most users with consumer card GPU, if 12Gb is not enough, then not a lot of cards can benefit from the marketed fast training and inference speed.

  File "Wuerstchen\train_stage_B.py", line 379, in <module>
    train(0, 1, 1)
  File "Wuerstchen\train_stage_B.py", line 252, in train
    loss = criterion(pred, latents)
  File "v2\train\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "v2\train\lib\site-packages\torch\nn\modules\loss.py", line 1174, in forward
    return F.cross_entropy(input, target, weight=self.weight,
  File "v2\train\lib\site-packages\torch\nn\functional.py", line 3029, in cross_entropy
    return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 11.99 GiB total capacity; 10.84 GiB already allocated; 0 bytes free; 10.98 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
wandb: Waiting for W&B process to finish... (failed 1). Press Ctrl-C to abort syncing.
dome272 commented 1 year ago

I think if you build on top of diffusers this will be much less. The demos here are not really memory efficient as well