beichenzbc / Long-CLIP

[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
Apache License 2.0
649 stars 33 forks source link

CLIP-G #76

Open vikas784 opened 3 weeks ago

vikas784 commented 3 weeks ago

Hi, I hope you are doing fine actually i m confused in one thing regarding the CLIP-G models because as your training data looks more of like G prompt and not L prompt which I like comma separated than why you trained the model with more G like prompts and not L like prompt can anyone explain me this please what is this confusion

beichenzbc commented 3 weeks ago

Sorry I don't quite understand your questions. Our training data is from ShareGPT4V and our training loss is CLIP's contrastive loss. So what do you mean by 'G prompt' or 'L prompt'

vikas784 commented 3 weeks ago

Okay let me clear it up again,

First I have one doubt, what is the difference between LONGCLIP-B and LONGCLIP-L,

Next Question you have given the SDXL pipeline using LONG clip and as we know in SDXL we use two text encoders correct?

In the SDXL pipeline, we use two prompts one for G and one for L

where G is a natural language and L is comma-separated correct?

Let me give you and example, G = a girl standing in the balcony on a sunny day L = Hyperrealistic, Soft lighting

Correct?

And so if I talk about the training data you used SHAREGPT4V, Its prompts are of natural language which is logical for G as it's written everywhere that G is for natural language,

But the weights you have published are of LONGCLIP-L so what L means doest it only means large or what I m still I am not sure about it and if it is G then why is its name L.,

Confused like hell

beichenzbc commented 3 weeks ago

‘what is the difference between LONGCLIP-B and LONGCLIP-L’: The base model is OpenAI-CLIP-ViT-B/16 and OpenAI-CLIP-ViT-L/14 respectively

beichenzbc commented 3 weeks ago

We don't present any longclip model which is based on OpenCLIP-G. In SDXL, we replace clip-L with longclip-L, and apply our interpolation strategy without any finetuning on OpenCLIP-bigG to support 248 tokens.

beichenzbc commented 3 weeks ago

CLIP-L is not only used in SDXL to process comma-separated words, so I don't think fine-tuning CLIP-L with natural language is strange nor confusing.

vikas784 commented 3 weeks ago

okay i am cleared till now, now see I have one finetuned sdxl model and just want to get predictions using that with the long clip text encoders and dont wanna use refiner i tried to do it in many different ways but still not able to solve it like how should i connect my finetuned sdxl model with your sdxl pipeline and remove refiner

beichenzbc commented 2 weeks ago

If you can load SDXL by

base = DiffusionPipeline.from_pretrained(
    "your_path_here", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
base.to("cuda")

refiner = DiffusionPipeline.from_pretrained(
    "your_path_here",
    text_encoder_2=base.text_encoder_2,
    vae=base.vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16",
)
refiner.to("cuda")

then, you can use our Long-CLIP text encoder by our code SDXL/sdxl.py with only modifying line 21 and 25 to your model path.

vikas784 commented 2 weeks ago

okay so see i dont have refiner I just have one fintuned model on SD1.5 how can I adjust it now can you please help me in that?

beichenzbc commented 2 weeks ago

you mean your model is SD1.5? or your model is SDXL base?

vikas784 commented 2 weeks ago

ohh sorry i mean my model is sdxl base only, Its just that I have a fine-tuned model

vikas784 commented 2 weeks ago

so actually what i just did was that I just fine-tuned sdxl1.0 base model with the help of Kohya scripts, and I don't have any refiner so what if just want to remove refiner and use ur longclip pipeline for sdxl for prediction from fine-tuned models

beichenzbc commented 2 weeks ago

well you can either use original original sdxl refiner together with your base model or only use your fine-tuned base model.

For the former, you can modify line 21 to load your base model and keep the other unchanged so that the refiner will be original SDXL refiner.

For the latter, you can delete line 50-56 and set high_noise_frac = 1.0