LargeDiT T2i, which text encoder should be used?

Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development

https://llama2-accessory.readthedocs.io/

Other

2.72k stars 176 forks source link

LargeDiT T2i, which text encoder should be used? #178

Open Miracle2333 opened 8 months ago

Miracle2333 commented 8 months ago

I load the pretrained text encoder from official LLAMA-2 and the generated results are random noise. So which text encoder should be used? Could you specify the hugging face repo id？

gaopengpjlab commented 8 months ago

We use frozen LLaMa-7B as a text encoder. Please download our T2I checkpoint which includes frozen text encoder and diffusion backbone contained in the same checkpoint.

gaopengpjlab commented 8 months ago

https://huggingface.co/Alpha-VLLM/Large-DiT/tree/main/240308_3b_1024

gaopengpjlab commented 8 months ago

Please note that our pretrained checkpoints only support high-resolution image generation.

Miracle2333 commented 8 months ago

We use frozen LLaMa-7B as a text encoder. Please download our T2I checkpoint which includes frozen text encoder and diffusion backbone contained in the same checkpoint.

Hi,

I pull the checkpoint from the hf site and find it doesn't contain text-encoder. In addition, the codes of demo.py show that we need to load text-encoder ckpt from other hugging face sites. Could you provide the text-encoder to be loaded here?