YangLing0818 / RPG-DiffusionMaster

[ICML 2024] Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs (RPG)
https://proceedings.mlr.press/v235/yang24ai.html
MIT License
1.7k stars 99 forks source link

Multiple animals and details not generated properly #52

Open katarzynasornat opened 5 months ago

katarzynasornat commented 5 months ago

Hi all! @YangLing0818 @BitCodingWalkin Thank you for your great work! I wanted to test it and had a big hope that it will solve my problem with hybrid animals instead of two separate generated with SDXL but either I do not have them completely or I miss some details which clearly appear in the regional prompt generated by GPT4 which I am using.

example: regional prompt generated:

Region0: A charmingly small tortoise stands on the left side of the path, its shiny shell a mosaic of earthy tones, a picture of steadiness and resolve. BREAK 
Region1: To the right, a tall hare with lustrous orange fur and sparkling, clever eyes stands alert, exuding a sense of swift readiness, yet with a warm gaze. BREAK 
Region2: Behind them unfolds a whimsical watercolor backdrop, where flowers blossom in vibrant hues and a quaint village nestles peacefully, capturing the essence of a children's fairytale.

Image I got:

image

No tortoise and no village in the background at all. Any reason why?

Below I add my gist as google colab notebook. Could you please help me to find out why I have this issue? I went through the examples from doc and it looks also complex in terms of details.