THUDM / CogView2

official code repo for paper "CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers"
Apache License 2.0
946 stars 79 forks source link
pretrained-models pytorch text-to-image transformer

Generate vivid Images for Chinese / English text

CogView2 is a hierarchical transformer (6B-9B-9B parameters) for text-to-image generation in general domain. This implementation is based on the SwissArmyTransformer library (v0.2).

Web Demo

Getting Started

Setup

Download

Our code will automatically download or detect the models into the path defined by envrionment variable SAT_HOME. You can download from here and place them (folders named coglm/cogview2-dsr/cogview2-itersr) under SAT_HOME.

Text-to-Image Generation

./text2image.sh --input-source input.txt

Arguments useful in inference are mainly:

You'd better specify a environment variable SAT_HOME to specify the path to store the downloaded model.

Chinese input is usually much better than English input.

Text-guided Completion

./text_guided_completion.sh --input-source input_comp.txt

The format of input is text image_path h0 w0 h1 w1, where all the separation are TAB (NOT space). The image at image_path will be center-cropped to 480*480 pixels and mask the square from (h0,w0)to (h1,w1). These coordinations are range from 0 to 1. The model will fill the square with object described in text. Please use a square much larger than the desired region.

comp_pipeline

Gallery

more_samples