CompVis / latent-diffusion

High-Resolution Image Synthesis with Latent Diffusion Models
MIT License
11.09k stars 1.45k forks source link

Difficulty with Inference for High-Resolution Semantic Image Synthesis #302

Open vladherasymenko opened 11 months ago

vladherasymenko commented 11 months ago

Hello!

Firstly, I want to express my gratitude for sharing this outstanding work! I am currently working on a Semantic Image Synthesis task and have been using your LDM implementation to generate high-resolution images, as described in section 4.3.2 of the paper. I have successfully trained a 256x256 model on my dataset, but I am facing challenges in generalizing it to higher resolutions. Specifically, I am unsure how to perform inference for custom resolutions, particularly non-square formats like 512x1024.

Could you kindly provide guidance on how to set up an inference configuration for achieving this? Your assistance would be immensely appreciated!

Thank you in advance!

Best regards,

Vlad

vladherasymenko commented 11 months ago

I'd also be grateful, if you could share some insights on how to train an LDM directly on custom resolutions, like, for example, 256x512

elhamAm commented 11 months ago

same