It is a novel methods for text-to-image generation. Thanks for sharing your work. One question, the experiment found that the effect is unstable, and LLMs may not output correct results. Will there be targeted optimization in this area, such as finetune vertical class LLMS based on the logic of image generation?
It is a novel methods for text-to-image generation. Thanks for sharing your work. One question, the experiment found that the effect is unstable, and LLMs may not output correct results. Will there be targeted optimization in this area, such as finetune vertical class LLMS based on the logic of image generation?