ModelsLab / diffusers_plus_plus

Diffusers++: State-of-the-art diffusion models for image and audio generation in PyTorch
https://huggingface.co/docs/diffusers
Apache License 2.0
9 stars 2 forks source link

ELLA - Equip Diffusion Models with LLM for Enhanced Semantic Alignment and Experiments [distillation methods] #4

Closed shauray8 closed 1 month ago

shauray8 commented 2 months ago

What does this PR do?

This PR introduces ELLA (Efficient Large Language Model Adapter), a lightweight approach to augment existing CLIP-based diffusion models with powerful Large Language Models (LLM). ELLA enhances prompt-following abilities and facilitates comprehension of dense and intricate text prompts without requiring training of U-Net or LLM. (only for SD15 as of now)

ELLA NOT-Fixed Embedding Length ELLA Fixed Embedding Length SD15
Example Image Example Image Example Image

Prompt - 'dog sniffing a rock'

Changes:

Paper: https://arxiv.org/pdf/2403.05135

Minimal Docs

from diffusers import EllaFixedDiffusionPipeline, ELLA, DPMSolverMultistepScheduler
ELLA = ELLA.from_pretrained('shauray/ELLA_SD15')

ella_pipeline = EllaFixedDiffusionPipeline.from_pretrained("Justin-Choo/epiCRealism-Natural_Sin_RC1_VAE",ELLA=ELLA, requires_safety_checker=False)
ella_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(ella_pipeline.scheduler.config)
ella_pipeline = ella_pipeline.to("cuda")

prompt = "a beautiful portrait of an empress in her garden"
negative_prompt = ""
image = ella_pipeline(prompt, negative_prompt=negative_prompt, guidance=7,num_inference_steps=30, height=768, width=512).images[0]  
shauray8 commented 2 months ago

Inference on Diff++

ella_pipeline = from diffusers import EllaFixedDiffusionPipeline, ELLA, DPMSolverMultistepScheduler
ELLA = ELLA.from_pretrained('shauray/ELLA_SD15')

ella_pipeline = EllaDiffusionPipeline.from_pretrained("Justin-Choo/epiCRealism-Natural_Sin_RC1_VAE",ELLA=ELLA, requires_safety_checker=False)
ella_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(ella_pipeline.scheduler.config)
ella_pipeline = ella_pipeline.to("cuda")

prompt = "a beautiful portrait of an empress in her garden"
negative_prompt = ""
image = ella_pipeline(prompt, negative_prompt=negative_prompt, guidance=7,num_inference_steps=30, height=768, width=512).images[0]  
shauray8 commented 1 month ago

All tests seem to work :)