Closed ziyye closed 2 months ago
SDXL have two clip text encoders: CLIP-L and CLIP-G. I replace the original CLIP-L text encoder with long-CLIP-L, then padding embeddings of original CLIP-G to length 248 (248 is the length of long-clip-L embeddings) and concat those embeddings with embeddings from long-CLIP-L. But the generated images not good. Anyone tried long-clip with SDXL, should it work?
Hi @ziyye would you mind sharing some images generated by your SDXL + long-CLIP-L implementation?
Thanks for your reply! Here are some images generated:
sdxl seems to use 'penultimate text encoder outputs', have you also used the same output from the long-clip?
Thanks for your reply! Here are some images generated:
I have the same problem. Have you solved it?
This actually works really well: https://github.com/SeaArtLab/ComfyUI-Long-CLIP
SDXL have two clip text encoders: CLIP-L and CLIP-G. I replace the original CLIP-L text encoder with long-CLIP-L, then padding embeddings of original CLIP-G to length 248 (248 is the length of long-clip-L embeddings) and concat those embeddings with embeddings from long-CLIP-L. But the generated images not good. Anyone tried long-clip with SDXL, should it work?