Open baaaad opened 2 weeks ago
I guess that's because SD1.5 is not good at handling human-related generation
I guess that's because SD1.5 is not good at handling human-related generation
The model also struggles to generate "a dog" and "a cat" together as "a dog and cat". And using SD2.1 did not solve the mentioned issue. Thus I have concerns regarding the ability of the method to combine `close' concepts.
Thanks for the great work. I noticed that the concepts mentioned in your paper, such as 'cat' and 'phone,' exhibit clear differences in semantic spaces. However, the model's performance is notably inadequate when handling 'close' concepts like 'a man' and 'a woman,' particularly in generating images for complex sentences such as 'a man and a woman are dancing'. Is this limitation inherent to the method itself? Does it only apply well to examples mentioned in the paper which must contain context interaction and with clear differences in semantic spaces?
Yes, we think this limitation comes from the poor basic ability of the model to distinguish semantic similar concepts. And our model should be given input concepts with contextual interaction to generate better results.
Thanks for the great work. I noticed that the concepts mentioned in your paper, such as 'cat' and 'phone,' exhibit clear differences in semantic spaces. However, the model's performance is notably inadequate when handling 'close' concepts like 'a man' and 'a woman,' particularly in generating images for complex sentences such as 'a man and a woman are dancing'. Is this limitation inherent to the method itself? Does it only apply well to examples mentioned in the paper which must contain context interaction and with clear differences in semantic spaces?