Before running the scripts, make sure to install the library's training dependencies:
To make sure you can successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements.
git branch -D main
git fetch
git checkout main
python3 -m pip install -e .
Notes
You need to register a Hugging Face account token and login with huggingface-cli login
python3 -m pip install huggingface_hub
If no command available in the PATH, it might be in the $HOME/.local/bin
~/.local/bin/huggingface-cli login
Start training
Training
Downloading Demo dataset
mkdir mscoco && cd mscoco
wget https://oneflow-static.oss-cn-beijing.aliyuncs.com/libai/Stable_diffusion/00000.tar
mkdir 00000
tar -xvf 00000.tar -C 00000/
running command
set your datapath and features in projects/Stable_Diffusion/configs/config.py
dataloader.train = LazyCall(build_nlp_train_loader)(
dataset=[
# set data path
LazyCall(TXTDataset)(
foloder_name="/path/to/mscoco/00000",
...,
)
]
)
train.update(
dict(
...,
# set checkpointing or not
activation_checkpoint=dict(enabled=True), # or False
# set zero stage
zero_optimization=dict(
enabled=True, # or False
stage=2, # Highly recommand stage=2, stage=1 or 3 is also supported
),
# set amp training
amp=dict(enabled=True), # or False
)
)
# set learning rate
optim.lr = 1e-3
DreamBooth is a method to personalize text2image models like stable diffusion given just a few(3~5) images of a subject.
Downloading Dataset
Download images from here and save them in a directory (such as /path/to/demo_dog/). This will be our training data.
DreamBooth Training
set your datapath and features in projects/Stable_Diffusion/configs/dreambooth_config.py
dataloader.train = LazyCall(build_nlp_train_loader)(
dataset=[
# set data path
LazyCall(DreamBoothDataset)(
instance_data_root="/path/to/demo_dog/",
instance_prompt="a photo of sks dog",
...,
)
]
)
train.update(
dict(
...,
# set checkpointing or not
activation_checkpoint=dict(enabled=True), # or False
# set zero stage
zero_optimization=dict(
enabled=True, # or False
stage=2, # Highly recommand stage=2, stage=1 or 3 is also supported
),
# set amp training
amp=dict(enabled=True), # or False
)
)
# set learning rate
optim.lr = 1e-3
Prior-preservation is used to avoid overfitting and language-drift. Refer to the paper to learn more about it. For prior-preservation we first generate images using the model with a class prompt and then use those during training along with our data. According to the paper, it's recommended to generate num_epochs * num_samples images for prior-preservation. 200-300 works well for most cases.
Firstly we need to generate prior-images using the model with a class prompt, here is an example, it will generate 200 prior-images :
bash projects/Stable_Diffusion/generate.sh
# generate.sh
export MODEL_NAME="CompVis/stable-diffusion-v1-4" # choose model type
export CLASS_DIR="/path/to/prior_dog/" # set data save path
export CLASS_PROMPT="a photo of dog" # set class prompt
python3 projects/Stable_Diffusion/generate_prior_image.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--class_data_dir=$CLASS_DIR \
--class_prompt="$CLASS_PROMPT" \
--num_class_images=200 \ # set num_images
Secondly, set your datapath and features in projects/Stable_Diffusion/configs/prior_preservation_config.py
dataloader.train = LazyCall(build_nlp_train_loader)(
dataset=[
LazyCall(DreamBoothDataset)(
instance_data_root="/path/to/demo_dog/",
instance_prompt="a photo of sks dog",
class_data_root="/path/to/prior_dog/",
class_prompt="a photo of dog",
...,
)
]
)
optim.lr = 2e-6 # set learning rate
model.train_text_encoder = True # train text_encoder or not, could be False
train.train_iter=2000 # set train_iter
train.log_period=10
Training Dreamboth with lora
Low-Rank Adaption of Large Language Models was first introduced by Microsoft in LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
In a nutshell, LoRA allows to adapt pretrained models by adding pairs of rank-decomposition matrices to existing weights and only training those newly added weights. This has a couple of advantages:
Previous pretrained weights are kept frozen so that the model is not prone to catastrophic forgetting
Rank-decomposition matrices have significantly fewer parameters than the original model, which means that trained LoRA weights are easily portable.
LoRA attention layers allow to control to which extent the model is adapted towards new training images via a scale parameter.
set your datapath and features in projects/Stable_Diffusion/configs/lora_config.py
dataloader.train = LazyCall(build_nlp_train_loader)(
dataset=[
# set data path
LazyCall(DreamBoothDataset)(
instance_data_root="/path/to/demo_dog/",
instance_prompt="a photo of sks dog",
...,
)
]
)
train.update(
dict(
...,
# set checkpointing or not
activation_checkpoint=dict(enabled=True), # or False
# set zero stage
zero_optimization=dict(
enabled=True, # or False
stage=2, # Highly recommand stage=2, stage=1 or 3 is also supported
),
# set amp training
amp=dict(enabled=True), # or False
)
)
# set learning rate
optim.lr = 5e-4
Here we can use onediff to inference our trained lora-model in Libai
import oneflow as flow
flow.mock_torch.enable()
from onediff import OneFlowStableDiffusionPipeline
from typing import get_args
from diffusers.models.attention_processor import AttentionProcessor
for processor_type in get_args(AttentionProcessor):
processor_type.forward = processor_type.__call__
model_path = "CompVis/stable-diffusion-v1-4"
pipe = OneFlowStableDiffusionPipeline.from_pretrained(
model_path,
use_auth_token=True,
revision="fp16",
torch_dtype=flow.float16,
)
pipe.unet.load_attn_procs("output/stable_diffusion/model_sd_for_inference/")
pipe = pipe.to("cuda")
for i in range(100):
prompt = "a photo of sks dog"
with flow.autocast("cuda"):
images = pipe(prompt).images
for j, image in enumerate(images):
image.save(f"{i}.png")
without lora
the model output save dir will be like this:
Here we can use onediff to inference our trained model in Libai
import oneflow as flow
flow.mock_torch.enable()
from onediff import OneFlowStableDiffusionPipeline
model_path = "output/stable_diffusion/model_sd_for_inference/"
pipe = OneFlowStableDiffusionPipeline.from_pretrained(
model_path,
use_auth_token=True,
revision="fp16",
torch_dtype=flow.float16,
)
pipe = pipe.to("cuda")
for i in range(100):
prompt = "a photo of sks dog"
with flow.autocast("cuda"):
images = pipe(prompt).images
for j, image in enumerate(images):
image.save(f"{i}.png")
这个pr要做的, stable diffusion的finetune代码:
说明:
Stable diffusion
This is an reimplement of training stable diffusion in LiBai
Environment
Before running the scripts, make sure to install the library's training dependencies:
To make sure you can successfully run the latest versions of the example scripts, we highly recommend installing from source and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements.
Install libai
libai installation, refer to Installation instructions
All available
[PLATFORM]
:Install onediff
Important
To make sure you can train stable diffusion in LiBai, please install onediff
Option 1: Fresh clone and dev install
Option 2: Setup if you were using the the
oneflow-fork
branch beforeNotes
huggingface-cli login
$HOME/.local/bin
Start training
Training
Downloading Demo dataset
running command
set your datapath and features in
projects/Stable_Diffusion/configs/config.py
running with 4 GPU
DreamBooth
DreamBooth is a method to personalize text2image models like stable diffusion given just a few(3~5) images of a subject.
Downloading Dataset
Download images from here and save them in a directory (such as
/path/to/demo_dog/
). This will be our training data.DreamBooth Training
set your datapath and features in
projects/Stable_Diffusion/configs/dreambooth_config.py
running with 4 GPU
Training DreamBooth with prior-preservation loss
Prior-preservation is used to avoid overfitting and language-drift. Refer to the paper to learn more about it. For prior-preservation we first generate images using the model with a class prompt and then use those during training along with our data. According to the paper, it's recommended to generate
num_epochs * num_samples
images for prior-preservation. 200-300 works well for most cases.Firstly we need to generate prior-images using the model with a class prompt, here is an example, it will generate 200 prior-images :
bash projects/Stable_Diffusion/generate.sh
Secondly, set your datapath and features in
projects/Stable_Diffusion/configs/prior_preservation_config.py
Training Dreamboth with lora
Low-Rank Adaption of Large Language Models was first introduced by Microsoft in LoRA: Low-Rank Adaptation of Large Language Models by Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
In a nutshell, LoRA allows to adapt pretrained models by adding pairs of rank-decomposition matrices to existing weights and only training those newly added weights. This has a couple of advantages:
scale
parameter.set your datapath and features in
projects/Stable_Diffusion/configs/lora_config.py
running with 4 GPU
Inference with trained model
model will be saved in
train.output_dir
inconfig.py
,with lora: the model output save dir will be like this:
Here we can use onediff to inference our trained lora-model in Libai
without lora the model output save dir will be like this:
Here we can use onediff to inference our trained model in Libai