YangLing0818 / RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
https://proceedings.mlr.press/v235/yang24ai.html
MIT License
1.7k stars 99 forks source link

Why does this support larger scale images than those trained with the base model? #49

Open andupotorac opened 6 months ago

andupotorac commented 6 months ago

I read through the paper and it isn't mentioned why using this library we can also generate images larger than the usual size, without fear of double heads or other artifacts.

Is it because of the regional prompting the algorithm performs? What if one of those regions themselves are larger than the 512x512 size, if using SD 1.5?