Open trolley813 opened 1 year ago
Thank you for sharing this! With your approach I managed to generate a 768x768 image on GTX 1070 in 1 minute (decoder phase), and average GPU usage was about 6.4 Gb. Before this, even 512x512 was not possible without utilization of shared memory (therefore, it was extremely slow).
Hello! This is not actually an issue, but a kind of a "how-to" post, namely how to use the model on a lower-end GPUs. The way proposed here is to generate embeddings on the CPU, convert them from float32 to float16 and run the decoder on the GPU. This way, one can generate images as large as 1024x1024 (the largest "officially supported" size) on a 8GB GPU. On my PC (Ryzen9 3950x CPU + RTX 2080 Super GPU) the speed is about 50 seconds per 1024x1024 image (note that the embeddings probably can be generated in a single batch, but I didn't yet figured out how to pass them separately, since the "plain" indexing does not work). (Note that the 2.1 was uncapable to generate even 768x768 on the same GPU.)
The code (plain old
.py
script, but can be easily converted to a.ipynb
notebook):