Closed dawei03896 closed 5 months ago
It can be used in SD1.x series in a plug-and-play manner where the text encoder is OpenAI-CLIP instead of Open-CLIP
In fact, the 'text-to-image generation' demo figure is generated by SD1.5
Are there any available examples? In A111 or comfyui?
We implement it in pytorch. You can use it in SD by implementing the pipeline of SD with only changing the text encoder into Long-CLIP.
@beichenzbc Is there a more elegant way to integrate into transformers, such that we can directly load via CLIPTextModel and CLIPTokenizer without modify encode_prompt() in the pipeline?
perhaps you could try the scripts in CLIP to transfer a pytorch model into the huggingface mode
pytorch 模型转换为 Huggingface 模式
Would you like provide the config.json? I want to convert diffusers format with a config.json file and a pytorch_model.bin file
We rewrote the pipeline of SD and didn't try to convert it into a huggingface format. Maybe you could try this link for easy use.
https://github.com/beichenzbc/Long-CLIP/issues/7#issuecomment-2053783281
What an amazing project! Can the results be used for SD models?