YangLing0818 / RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
https://proceedings.mlr.press/v235/yang24ai.html
MIT License
1.7k stars 99 forks source link

generation object outside of the region #54

Closed X-fxx closed 3 months ago

X-fxx commented 5 months ago

image

Why does this person's avatar appear in the moon area? Which part of the method in the text does it correspond to?

YangLing0818 commented 3 months ago

Thanks for your comment. Our region planning is only designed for providing proper initial position and size. Hence, in the denoising process, we enable model to adaptively modificate the size of objects for better generation quality.