Alpha-VLLM / LLaMA2-Accessory

An Open-source Toolkit for LLM Development
https://llama2-accessory.readthedocs.io/
Other
2.61k stars 167 forks source link

LargeDiT T2i, which text encoder should be used? #178

Open Miracle2333 opened 3 months ago

Miracle2333 commented 3 months ago

I load the pretrained text encoder from official LLAMA-2 and the generated results are random noise. So which text encoder should be used? Could you specify the hugging face repo id?

gaopengpjlab commented 3 months ago
image

We use frozen LLaMa-7B as a text encoder. Please download our T2I checkpoint which includes frozen text encoder and diffusion backbone contained in the same checkpoint.

gaopengpjlab commented 3 months ago

https://huggingface.co/Alpha-VLLM/Large-DiT/tree/main/240308_3b_1024

gaopengpjlab commented 3 months ago

Please note that our pretrained checkpoints only support high-resolution image generation.

Miracle2333 commented 3 months ago
image

We use frozen LLaMa-7B as a text encoder. Please download our T2I checkpoint which includes frozen text encoder and diffusion backbone contained in the same checkpoint.

Hi,

I pull the checkpoint from the hf site and find it doesn't contain text-encoder. In addition, the codes of demo.py show that we need to load text-encoder ckpt from other hugging face sites. Could you provide the text-encoder to be loaded here? image