beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Apache License 2.0
620 stars 30 forks source link

Failed to try with SDXL #7

Closed ziyye closed 2 months ago

ziyye commented 6 months ago

SDXL have two clip text encoders: CLIP-L and CLIP-G. I replace the original CLIP-L text encoder with long-CLIP-L, then padding embeddings of original CLIP-G to length 248 (248 is the length of long-clip-L embeddings) and concat those embeddings with embeddings from long-CLIP-L. But the generated images not good. Anyone tried long-clip with SDXL, should it work?

GongXinyuu commented 6 months ago

SDXL have two clip text encoders: CLIP-L and CLIP-G. I replace the original CLIP-L text encoder with long-CLIP-L, then padding embeddings of original CLIP-G to length 248 (248 is the length of long-clip-L embeddings) and concat those embeddings with embeddings from long-CLIP-L. But the generated images not good. Anyone tried long-clip with SDXL, should it work?

Hi @ziyye would you mind sharing some images generated by your SDXL + long-CLIP-L implementation?

ziyye commented 6 months ago

Thanks for your reply! Here are some images generated:

image image image image
ppeterpp commented 6 months ago

sdxl seems to use 'penultimate text encoder outputs', have you also used the same output from the long-clip?

plastic0313 commented 5 months ago

Thanks for your reply! Here are some images generated: image image image image

I have the same problem. Have you solved it?

zer0int commented 5 months ago

Same. I went as far as stripping down the prompt to just "cat", and I got this:

something2ddd

Just why?! 😂

I'll update when and if I get it working better. 🙃

zer0int commented 5 months ago

This actually works really well: https://github.com/SeaArtLab/ComfyUI-Long-CLIP