YangLing0818 / RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
https://proceedings.mlr.press/v235/yang24ai.html
MIT License
1.7k stars 99 forks source link

On the issue of effectiveness #30

Open zhangsdly opened 9 months ago

zhangsdly commented 9 months ago

It is a novel methods for text-to-image generation. Thanks for sharing your work. One question, the experiment found that the effect is unstable, and LLMs may not output correct results. Will there be targeted optimization in this area, such as finetune vertical class LLMS based on the logic of image generation?