Official imple. of ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Recent text-to-image customization works have been proven successful in generating images of given concepts by fine-tuning the diffusion models on a few examples. However, these methods tend to overfit the concepts, resulting in failure to create the concept under multiple conditions (e.g., headphone is missing when generating “a <sks> dog wearing a headphone”). Interestingly, we notice that the base model before fine-tuning exhibits the capability to compose the base concept with other elements (e.g., “a dog wearing a headphone”), implying that the compositional ability only disappears after personalization tuning. Inspired by this observation, we present ClassDiffusion, a simple technique that leverages a semantic preservation loss to explicitly regulate the concept space when learning the new concept. Despite its simplicity, this helps avoid semantic drift when fine-tuning on the target concepts. Extensive qualitative and quantitative experiments demonstrate that the use of semantic preservation loss effectively improves the compositional abilities of the fine-tune models. In response to the ineffective evaluation of CLIP-T metrics, we introduce BLIP2-T metric, a more equitable and effective evaluation metric for this particular domain. We also provide in-depth empirical study and theoretical analysis to better understand the role of the proposed loss. Lastly, we also extend our ClassDiffusion to personalized video generation, demonstrating its flexibility.
ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance
Jiannan Huang, Jun Hao Liew, Hanshu Yan, Yuyang Yin, Yao Zhao, Yunchao Wei
Our method can generate more aligned personalized images with explicit class guidance
Set Up
git clone https://github.com/Rbrq03/ClassDiffusion.git
cd ClassDiffusion
pip install -r requirements.txt
Warning: Currently, ClassDiffusion don't support PEFT, please ensure PEFT is uninstalled in your environment, or check PR. We will move forward with this PR merge soon.
Single Concept
bash scripts/train_single.sh
Multiple Concepts
bash scripts/train_multi.sh
single concept
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16,
).to("cuda")
pipeline.unet.load_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin")
pipeline.load_textual_inversion("path-to-save-model", weight_name="<new1>.bin")
image = pipeline(
"<new1> dog swimming in the pool",
num_inference_steps=100,
guidance_scale=6.0,
eta=1.0,
).images[0]
image.save("dog.png")
Multiple Concepts
import torch
from diffusers import DiffusionPipeline
pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.unet.load_attn_procs("path-to-save-model", weight_name="pytorch_custom_diffusion_weights.bin")
pipeline.load_textual_inversion("path-to-save-model", weight_name="<new1>.bin")
pipeline.load_textual_inversion("path-to-save-model", weight_name="<new2>.bin")
image = pipeline(
"a <new1> teddy bear sitting in front of a <new2> barn",
num_inference_steps=100,
guidance_scale=6.0,
eta=1.0,
).images[0]
image.save("multi-subject.png")
BLIP2-T
You can use following code:
from PIL import Image
from utils.blip2t import BLIP2T
blip2t = BLIP2T("Salesforce/blip-itm-large-coco", "cuda")
prompt = "photo of a dog"
image = Image.open("data/dog/00.jpg")
score = blip2t.text_similarity(prompt, image)[0]
score
or
python blip2t.py
Video Generation
python videogen.py
Ckpt for quick test
Concept(s) | Weight |
---|---|
dog | weight |
bear+barn | weight |
Single Concept Results
Multiple Concepts Results
Video Generation Results
If you make use of our work, please cite our paper.
@article{huang2024classdiffusion,
title={ClassDiffusion: More Aligned Personalization Tuning with Explicit Class Guidance},
author={Huang, Jiannan and Liew, Jun Hao and Yan, Hanshu and Yin, Yuyang and Zhao, Yao and Wei, Yunchao},
journal={arXiv preprint arXiv:2405.17532},
year={2024}
}
We thanks to the following repo for their excellent and well-documented code based: