Kwai-Kolors / Kolors

Kolors Team
Apache License 2.0
3.43k stars 219 forks source link

Bad text-image alignment #53

Open chenbinghui1 opened 1 month ago

chenbinghui1 commented 1 month ago

I tested many cases mainly focusing on person. The images are almost all half-body shots. Even if the full body is described, it cannot generate a complete full-body image including shoes, leg positions, etc. Additionally, controlling the orientation of the body, clothing, and other aspects are inaccurate. However, one obvious point is that the image quality is quite high, with an artistic touch. I personally suspect that this model has overfitted on certain data, which results in it performing better than other models on specific metrics. However, whether this can be used to prove the true capability of the model is questionable, and I feel that these metrics might be somewhat misleading.