damian0815 / compel

A prompting enhancement library for transformers-type text embedding systems
MIT License
489 stars 42 forks source link

Support for SD3 #92

Open adhikjoshi opened 3 weeks ago

adhikjoshi commented 3 weeks ago

Support for SD3

Skquark commented 3 weeks ago

Wondering the same. Just getting SD3 integrated into DiffusionDeluxe and wondering if I should try the same compel code I had working for SDXL, only without the refiner. The difference with the 3 pipeline is we have 3 tokenizers and text_encoders instead of 2, and not sure if it'll work with passing only 2 or if it's gonna need some reworking. I'm gonna try it out, but I get the impression it's not gonna work immediately with the new architecture, fingers crossed..

damian0815 commented 3 weeks ago

i unfortunately do not have the resources to update compel for SD3. i'd be happy to accept a pull request if someone wanted to figure out how to do it. keeping up with AI dev is exhausting and i'm not getting paid to do this.

AbhinavGopal commented 2 weeks ago

I'm thinking of working on this! If anyone is interested on working on this together, shoot me an email @ abhinav@rubbrband.com :)

MohamedAliRashad commented 1 week ago

@damian0815 How can we help you ?

damian0815 commented 1 week ago

figure out what needs to be done to support SD3 and do it :D

damian0815 commented 1 week ago

i might see if i can spare a few hours this weekend to take a look.

MohamedAliRashad commented 1 week ago

@damian0815 You don't need to. A library named sd_embed has achieved what we want.

Daniel-SicSo-Edinburgh commented 1 week ago

Yes, sd_embed did SD3 support for this. I would still like to see support within compel just because it is structured a bit better IMO.

Daniel-SicSo-Edinburgh commented 1 week ago

Also, if you are using SD3 without T5, you can use the already existing functionality with some adjustments:

path_to_file = '.../sd3_medium_incl_clips.safetensors' #path to sd3_medium_incl_clips.safetensors

model = StableDiffusion3Pipeline.from_single_file(path_to_file,
                                                  torch_dtype=torch.float16,
                                                  use_safetensors=True,
                                                  text_encoder_3=None)

prompt = 'Some prompt'
neg_prompt = 'Some negative prompt'

compeler = Compel(tokenizer=[model[1].tokenizer, model[1].tokenizer_2],
                  text_encoder=[model[1].text_encoder, model[1].text_encoder_2],
                  returned_embeddings_type=ReturnedEmbeddingsType.PENULTIMATE_HIDDEN_STATES_NON_NORMALIZED,
                  truncate_long_prompts=False,
                  requires_pooled=[True, True],
                  device="cuda")

embeds, pooled_embeds = compeler([prompt, neg_prompt])

prompt_embed = embeds[0].unsqueeze(0)
neg_prompt_embed = embeds[1].unsqueeze(0)

prompt_embed = torch.nn.functional.pad(prompt_embed, (0, 2048))
neg_prompt_embed = torch.nn.functional.pad(neg_prompt_embed, (0, 2048))

pooled_prompt_embed = pooled_embeds[0].unsqueeze(0)
pooled_neg_prompt_embed = pooled_embeds[1].unsqueeze(0)

images = model(prompt_embeds=prompt_embed,
                pooled_prompt_embeds=pooled_prompt_embed,
                negative_prompt_embeds=neg_prompt_embed,
                negative_pooled_prompt_embeds=pooled_neg_prompt_embed,
                guidance_scale=3,
                num_inference_steps=40).images

 images[0].save('output_image.png')
damian0815 commented 1 week ago

i have compel SD3 90% of the way there..