eclipse-t2i / lambda-eclipse-inference

Official PyTorch implementation of "λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space"
https://eclipse-t2i.github.io/Lambda-ECLIPSE/
MIT License
43 stars 5 forks source link

λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space

Open In Colab

Version 2 of the paper is out!

🚀 Latest Updates (April 2024)

News: Checkout our previous work, ECLIPSE on resource effeicient T2I accepted @ CVPR 2024.

Overview

This repository contains the inference code for our paper, λ-ECLIPSE.

Please follow the below steps to run the inference locally.


Examples

Setup

Installation

git clone git@github.com:eclipse-t2i/lambda-eclipse-inference.git

conda create -p ./venv python=3.9
pip install -r requirements.txt

Run Inference

Open In Colab

Note: λ-ECLIPSE prior is not a diffusion model -- while image decoders are.

We recommend either referring to the colab notebook or test.py script to understand the inner working of λ-ECLIPSE.

# run the inference:
conda activate ./venv

# single-subject example
python test_quick.py --prompt="a cat on top of the snow mountain" --subject1_path="./assets/cat.png" --subject1_name="cat"

# single-subject canny example
python ./test_quick.py --prompt="a dog is surfing" --subject1_path="./assets/dog2.png" --subject1_name="dog" --canny_image="./assets/dog_surf_ref.jpg"

# multi-subject example
python test_quick.py --prompt="a cat wearing glasses at a park" --subject1_path="./assets/cat.png" --subject1_name="cat" --subject2_path="./assets/blue_sunglasses.png" --subject2_name="glasses"

## results will be stored in ./assets/

Run Demo

conda activate ./venv
gradio main.py

Concept-specific finetuning

🔥🔥🔥 All concepts combined training:

export DATASET_PATH="<path-to-parent-folder-containing-concept-specific-folders>"
export OUTPUT_DIR="<output-dir>"
export TRAINING_STEPS=8000 # for 30 concepts --> ~250 iterations per concept

python train_text_to_image_decoder_whole_db.py \
        --instance_data_dir=$DATASET_PATH \
        --subject_data_dir=$DATASET_PATH \
        --output_dir=$OUTPUT_DIR \
        --validation_prompts='A dog' \
        --resolution=768 \
        --train_batch_size=1 \
        --gradient_accumulation_steps=4 \
        --gradient_checkpointing \
        --max_train_steps=$TRAINING_STEPS \
        --learning_rate=1e-05 \
        --max_grad_norm=1 \
        --checkpoints_total_limit=3 \
        --lr_scheduler=constant \
        --lr_warmup_steps=0 \
        --report_to=wandb \
        --validation_epochs=1000 \
        --checkpointing_steps=1000 \
        --push_to_hub

Individual concept training:

export DATASET_PATH="<path-to-folder-containing-images>"
export OUTPUT_DIR="<output-dir>"
export CONCEPT="<high-level-concept-name-like-dog>" # !!! Note: This is to check concept overfitting. This never supposed to generate your concept images.
export TRAINING_STEPS=400

python train_text_to_image_decoder.py \
        --instance_data_dir=$DATASET_PATH \
        --subject_data_dir=$DATASET_PATH \
        --output_dir=$OUTPUT_DIR \
        --validation_prompts="A $CONCEPT" \
        --resolution=768 \
        --train_batch_size=1 \
        --gradient_accumulation_steps=4 \
        --gradient_checkpointing \
        --max_train_steps=$TRAINING_STEPS \
        --learning_rate=1e-05 \
        --max_grad_norm=1 \
        --checkpoints_total_limit=4 \
        --lr_scheduler=constant \
        --lr_warmup_steps=0 \
        --report_to=wandb \
        --validation_epochs=100 \
        --checkpointing_steps=100 \
        --push_to_hub

Combined Inference (Prior + Finetunined UNet):

To perform combined λ-ECLIPSE and finetuned UNet (previous step) inference:

# run the inference:
conda activate ./venv

# single/multi subject example
python test_quick.py --unet_checkpoint="mpatel57/backpack_dog" --prompt="a backpack at the beach" --subject1_path="./assets/backpack_dog.png" --subject1_name="backpack"

## results will be stored in ./assets/

🚀 Multiconcept Interpolation

Please refer to the following script to perform interpolations on your own concepts:

python ./interpolation.py

Acknowledgement

We would like to acknoweldge excellent open-source text-to-image models (Kalro and Kandinsky) without them this work would not have been possible. Also, we thank HuggingFace for streamlining the T2I models.