Starting point for training sflckr

CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis

https://arxiv.org/abs/2012.09841

MIT License

5.8k stars 1.15k forks source link

Starting point for training sflckr #16

Closed ink1 closed 3 years ago

ink1 commented 3 years ago

Hey guys, great work! I'm trying to run training on a dataset similar to your sflckr. However I'm hitting this error immediately after validation or training starts, right after "Summoning checkpoint.": assert t <= self.block_size, "Cannot forward, model block size is exhausted." AssertionError: Cannot forward, model block size is exhausted. Assuming this was GPU memory related I reduced the model size but the error persisted. So I started to think that perhaps this has something to do with the configuration. My starting point is your sflckr.yaml python main.py --base configs/sflckr.yaml -t True --gpus 0, Any hints are highly appreciated. Thanks!

rromb commented 3 years ago

Hi, thanks for your interest in our work!

It is probably a problem with the size of the latent representations of your models. For the S-FLCKR experiments with a downsampling factor of 16, we used representations of length 16x16=256 for both the images and the semantic masks (i.e. the inputs are 256x256 in size). This means that the entire sequence has a length of 256+256=512, and therefore we need block_size=512, which can be set via the configuration for the transformer. In case you use the same downsampling factor: Did you also make sure that your inputs have size 256x256?

ink1 commented 3 years ago

@rromb hi Robin, the size of pictures in the dataset is larger than 256. But I was under the impression that the train data generator crops 256x256 from the data input. Is it not the case? Thanks

ink1 commented 3 years ago

yup, resizing to 256x256 helped! and then i hit the memory limit : )

tommyMessi commented 3 years ago

Hey guys, great work! I'm trying to run training on a dataset similar to your sflckr. However I'm hitting this error immediately after validation or training starts, right after "Summoning checkpoint.": assert t <= self.block_size, "Cannot forward, model block size is exhausted." AssertionError: Cannot forward, model block size is exhausted. Assuming this was GPU memory related I reduced the model size but the error persisted. So I started to think that perhaps this has something to do with the configuration. My starting point is your sflckr.yaml python main.py --base configs/sflckr.yaml -t True --gpus 0, Any hints are highly appreciated. Thanks!

hi I want to train the task like sflckr too . how can I change config file? How do I create my train data？ thanks