THUDM / CogView2

official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"
Apache License 2.0
949 stars 78 forks source link

Suggestions for roadmap #14

Open jdagdelen opened 2 years ago

jdagdelen commented 2 years ago

Hi CogView team. First off, great work! This method's results are very impressive. I just wanted to post some observations that I've had that might help inform future roadmap.

  1. Generations tend to include watermarks and other artifacts of online images. One common one I see is a white bar at the bottom with black pseudo-text (see examples below) bar_example watermark_example

  2. Add support for non-square inpainting/replacement boundaries.

  3. Hands and arms seem to be deformed, have extra fingers, etc. Suggest adding more data with pictures of hands to help model fix those issues.

Congrats on the paper and keep up the good work!

dza6549 commented 2 years ago

Cogview2 appears to have been trained on stock photos and I'm guessing many of the training images had the white bar so I'm not sure how it can be easily avoided. It is a great development and making the model freely available is great.

Sleepychord commented 2 years ago

@jdagdelen @dza6549 Thank you very much for suggestions.

  1. I have also observed the white bars or other watermarks, we will collect more data and also clean the data in the following works!
  2. The non-square inpainting/replacement boundaries is supported, but I don't know how to input the mask and don't have a enough time to write a UI... I will work on that afterwards.
  3. The reasons might be more complex and I will improve the method in the following works.