beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Apache License 2.0
620 stars 30 forks source link

Can the results be used for SD models? #1

Closed dawei03896 closed 5 months ago

dawei03896 commented 6 months ago

What an amazing project! Can the results be used for SD models?

beichenzbc commented 6 months ago

It can be used in SD1.x series in a plug-and-play manner where the text encoder is OpenAI-CLIP instead of Open-CLIP

In fact, the 'text-to-image generation' demo figure is generated by SD1.5

JZZZ1314 commented 6 months ago

Are there any available examples? In A111 or comfyui?

beichenzbc commented 6 months ago

We implement it in pytorch. You can use it in SD by implementing the pipeline of SD with only changing the text encoder into Long-CLIP.

haofanwang commented 6 months ago

@beichenzbc Is there a more elegant way to integrate into transformers, such that we can directly load via CLIPTextModel and CLIPTokenizer without modify encode_prompt() in the pipeline?

beichenzbc commented 6 months ago

perhaps you could try the scripts in CLIP to transfer a pytorch model into the huggingface mode

xyxxmb commented 5 months ago

pytorch 模型转换为 Huggingface 模式

Would you like provide the config.json? I want to convert diffusers format with a config.json file and a pytorch_model.bin file

beichenzbc commented 5 months ago

We rewrote the pipeline of SD and didn't try to convert it into a huggingface format. Maybe you could try this link for easy use.

https://github.com/beichenzbc/Long-CLIP/issues/7#issuecomment-2053783281