ali-vilab / Ranni

https://ranni-t2i.github.io/Ranni/
Apache License 2.0
206 stars 15 forks source link

About unsatisfactory results #19

Open HuiZhang0812 opened 2 months ago

HuiZhang0812 commented 2 months ago

Hi, I did some testing and found that the layout generated by "Text-to-panel" contains a lot of non-object information (such as some adjectives). Did you encounter similar problems during your testing?

image

Here, boxes are assigned to adjectives such as depth of field and high quality, which is not suitable for practical application.

thss15fyt commented 2 months ago

Put these style related words into the “postfix” box below to avoid assigning boxes for them.

---- Replied Message ---- | From | Hui @.> | | Date | 07/04/2024 15:33 | | To | ali-vilab/Ranni @.> | | Cc | Subscribed @.***> | | Subject | [ali-vilab/Ranni] About unsatisfactory results (Issue #19) |

Hi, I did some testing and found that the layout generated by "Text-to-panel" contains a lot of non-object information (such as some adjectives). Did you encounter similar problems during your testing? image.png (view on web) Here, boxes are assigned to adjectives such as depth of field and high quality, which is not suitable for practical application.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

HuiZhang0812 commented 2 months ago

Put these style related words into the “postfix” box below to avoid assigning boxes for them. ---- Replied Message ---- | From | Hui @.> | | Date | 07/04/2024 15:33 | | To | ali-vilab/Ranni @.> | | Cc | Subscribed @.> | | Subject | [ali-vilab/Ranni] About unsatisfactory results (Issue #19) | Hi, I did some testing and found that the layout generated by "Text-to-panel" contains a lot of non-object information (such as some adjectives). Did you encounter similar problems during your testing? image.png (view on web) Here, boxes are assigned to adjectives such as depth of field and high quality, which is not suitable for practical application. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.>

Thanks for your quick reply. The effect is much better as you suggested. However, is there any way to automatically divide the prompt input by the user into objects and suffixes, as the current manual division is not user-friendly. In addition, sometimes the generated results do not strictly follow the box, for example, the bed in the figure below does not strictly follow its box. Do you have any conclusion on this point?

image
thss15fyt commented 2 months ago
  1. For the first problem, we use zero-shot ability of llama to achieve element extraction, thus it sometimes fail to distingush these non-object items. You could try to modify the system prompt to better instruct the LLM, or just try different LLMs.
  2. For the control problem, you can set smaller control_step or higher control_scale.
HuiZhang0812 commented 2 months ago

Thanks for your suggestion, I will try it.