shauray8 commented 2 months ago

What does this PR do?

This PR introduces ELLA (Efficient Large Language Model Adapter), a lightweight approach to augment existing CLIP-based diffusion models with powerful Large Language Models (LLM). ELLA enhances prompt-following abilities and facilitates comprehension of dense and intricate text prompts without requiring training of U-Net or LLM. (only for SD15 as of now)

ELLA NOT-Fixed Embedding Length	ELLA Fixed Embedding Length	SD15

Prompt - 'dog sniffing a rock'

Changes:

Addition of ELLA module for enhanced semantic alignment in text-to-image models.
Implementation of TSC for dynamic adaptation of semantic features over-sampling timesteps.
Making it work with LLama or Phi3
[Not Sure] Write a training script for SDXL (the training is pretty neat and minimal)

Paper: https://arxiv.org/pdf/2403.05135

Minimal Docs

from diffusers import EllaFixedDiffusionPipeline, ELLA, DPMSolverMultistepScheduler
ELLA = ELLA.from_pretrained('shauray/ELLA_SD15')

ella_pipeline = EllaFixedDiffusionPipeline.from_pretrained("Justin-Choo/epiCRealism-Natural_Sin_RC1_VAE",ELLA=ELLA, requires_safety_checker=False)
ella_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(ella_pipeline.scheduler.config)
ella_pipeline = ella_pipeline.to("cuda")

prompt = "a beautiful portrait of an empress in her garden"
negative_prompt = ""
image = ella_pipeline(prompt, negative_prompt=negative_prompt, guidance=7,num_inference_steps=30, height=768, width=512).images[0]

shauray8 commented 2 months ago

Inference on Diff++

ella_pipeline = from diffusers import EllaFixedDiffusionPipeline, ELLA, DPMSolverMultistepScheduler
ELLA = ELLA.from_pretrained('shauray/ELLA_SD15')

ella_pipeline = EllaDiffusionPipeline.from_pretrained("Justin-Choo/epiCRealism-Natural_Sin_RC1_VAE",ELLA=ELLA, requires_safety_checker=False)
ella_pipeline.scheduler = DPMSolverMultistepScheduler.from_config(ella_pipeline.scheduler.config)
ella_pipeline = ella_pipeline.to("cuda")

prompt = "a beautiful portrait of an empress in her garden"
negative_prompt = ""
image = ella_pipeline(prompt, negative_prompt=negative_prompt, guidance=7,num_inference_steps=30, height=768, width=512).images[0]

shauray8 commented 1 month ago

All tests seem to work :)

ModelsLab / diffusers_plus_plus

ELLA - Equip Diffusion Models with LLM for Enhanced Semantic Alignment and Experiments [distillation methods] #4

What does this PR do?

Changes:

Minimal Docs

Inference on Diff++