CompVis / taming-transformers

Taming Transformers for High-Resolution Image Synthesis
https://arxiv.org/abs/2012.09841
MIT License
5.73k stars 1.14k forks source link

How to do high resolution unconditional sampling? #130

Open xiankgx opened 2 years ago

xiankgx commented 2 years ago

The high resolution images from this repo seems to be sampled from conditional transformer models. How do I sample a high resolution image from an unconditional transformer? Do I have to train the transformer using high resolution images? Any help is much appreciated.

xiankgx commented 2 years ago

It would be helpful if anyone knows how to reproduce Figure 39 in the paper where samples are sampled from a model trained on LSUN Churches & Tower using sliding attention window.

rromb commented 2 years ago

You can also use the sliding window approach for unconditional sampling, but the quality of the rendered sample typically depends on the spatial structure of your training data. For the LSUN Churches&Tower model, this works well for horizontally extending images. Note that you can use the same logic as in this sampling script, but you have to remove the dependency on the spatial conditioning (and use a constant start-of-sequence token instead).

RichardXue123 commented 6 months ago

You can also use the sliding window approach for unconditional sampling, but the quality of the rendered sample typically depends on the spatial structure of your training data. For the LSUN Churches&Tower model, this works well for horizontally extending images. Note that you can use the same logic as in this sampling script, but you have to remove the dependency on the spatial conditioning (and use a constant start-of-sequence token instead).

Could you please explain in more detail on how to change the file make_samples.py into unconditional slide window sampling?