JoePenna / Dreambooth-Stable-Diffusion

Implementation of Dreambooth (https://arxiv.org/abs/2208.12242) by way of Textual Inversion (https://arxiv.org/abs/2208.01618) for Stable Diffusion (https://arxiv.org/abs/2112.10752). Tweaks focused on training faces, objects, and styles.
MIT License
3.2k stars 554 forks source link
ai artificial-intelligence image-generation img2img latent-diffusion machine-learning model-training stable-diffusion txt2img

Extended Dreambooth How-To Guides by Yushan

For Running On Vast.ai
For Running On Google Colab
For Running On a Local PC (Windows)
For Running On a Local PC (Ubuntu)
Adapting Corridor Digital's Dreambooth Tutorial To JoePenna's Repo
Using Captions in JoePenna's Dreambooth

Index

The Repo Formerly Known As "Dreambooth"

image

Notes by Joe Penna

INTRODUCTIONS!

Hi! My name is Joe Penna.

You might have seen a few YouTube videos of mine under MysteryGuitarMan. I'm now a feature film director. You might have seen ARCTIC or STOWAWAY.

For my movies, I need to be able to train specific actors, props, locations, etc. So, I did a bunch of changes to @XavierXiao's repo in order to train people's faces.

I can't release all the tests for the movie I'm working on, but when I test with my own face, I release those on my Twitter page - @MysteryGuitarM.

Lots of these tests were done with a buddy of mine -- Niko from CorridorDigital. It might be how you found this repo!

I'm not really a coder. I'm just stubborn, and I'm not afraid of googling. So, eventually, some really smart folks joined in and have been contributing. In this repo, specifically: @djbielejeski @gammagec @MrSaad –– but so many others in our Discord!

This is no longer my repo. This is the people-who-wanna-see-Dreambooth-on-SD-working-well's repo!

Now, if you wanna try to do this... please read the warnings below first:

WARNING!

Setup

Easy RunPod Instructions

Note Runpod periodically upgrades their base Docker image which can lead to repo not working. None of the Youtube videos are up to date but you can still follow them as a guide. Follow along the typical Runpod Youtube videos/tutorials, with the following changes:

From within the My Pods page,

Carry on with the rest of the guide:

VIDEO INSTRUCTIONS

Vast.AI Instructions

Running Locally Instructions

Setup - Virtual Environment

Pre-Requisites

  1. Git
  2. Python 3.10
  3. Open cmd
  4. Clone the repository
    1. C:\>git clone https://github.com/JoePenna/Dreambooth-Stable-Diffusion
  5. Navigate into the repository
    1. C:\>cd Dreambooth-Stable-Diffusion

Install Dependencies and Activate Environment

cmd> python -m venv dreambooth_joepenna
cmd> dreambooth_joepenna\Scripts\activate.bat
cmd> pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 --extra-index-url https://download.pytorch.org/whl/cu117
cmd> pip install -r requirements.txt

Run

cmd> python "main.py" --project_name "ProjectName" --training_model "C:\v1-5-pruned-emaonly-pruned.ckpt" --regularization_images "C:\regularization_images" --training_images "C:\training_images" --max_training_steps 2000 --class_word "person" --token "zwx" --flip_p 0 --learning_rate 1.0e-06 --save_every_x_steps 250

Cleanup

cmd> deactivate 

Setup - Conda

Pre-Requisites

  1. Git
  2. Python 3.10
  3. miniconda3
  4. Open Anaconda Prompt (miniconda3)
  5. Clone the repository
    1. (base) C:\>git clone https://github.com/JoePenna/Dreambooth-Stable-Diffusion
  6. Navigate into the repository
    1. (base) C:\>cd Dreambooth-Stable-Diffusion

Install Dependencies and Activate Environment

(base) C:\Dreambooth-Stable-Diffusion> conda env create -f environment.yaml
(base) C:\Dreambooth-Stable-Diffusion> conda activate dreambooth_joepenna
Run

cmd> python "main.py" --project_name "ProjectName" --training_model "C:\v1-5-pruned-emaonly-pruned.ckpt" --regularization_images "C:\regularization_images" --training_images "C:\training_images" --max_training_steps 2000 --class_word "person" --token "zwx" --flip_p 0 --learning_rate 1.0e-06 --save_every_x_steps 250

Cleanup
cmd> conda deactivate

Configuration File and Command Line Reference

Example Configuration File

{
    "class_word": "woman",
    "config_date_time": "2023-04-08T16-54-00",
    "debug": false,
    "flip_percent": 0.0,
    "gpu": 0,
    "learning_rate": 1e-06,
    "max_training_steps": 3500,
    "model_path": "D:\\stable-diffusion\\models\\v1-5-pruned-emaonly-pruned.ckpt",
    "model_repo_id": "",
    "project_config_filename": "my-config.json",
    "project_name": "<token> project",
    "regularization_images_folder_path": "D:\\stable-diffusion\\regularization_images\\Stable-Diffusion-Regularization-Images-person_ddim\\person_ddim",
    "save_every_x_steps": 250,
    "schema": 1,
    "seed": 23,
    "token": "<token>",
    "token_only": false,
    "training_images": [
        "001@a photo of <token> looking down.png",
        "002-DUPLICATE@a close photo of <token> smiling wearing a black sweatshirt.png",
        "002@a photo of <token> wearing a black sweatshirt sitting on a blue couch.png",
        "003@a photo of <token> smiling wearing a red flannel shirt with a door in the background.png",
        "004@a photo of <token> wearing a purple sweater dress standing with her arms crossed in front of a piano.png",
        "005@a close photo of <token> with her hand on her chin.png",
        "005@a photo of <token> with her hand on her chin wearing a dark green coat and a red turtleneck.png",
        "006@a close photo of <token>.png",
        "007@a close photo of <token>.png",
        "008@a photo of <token> wearing a purple turtleneck and earings.png",
        "009@a close photo of <token> wearing a red flannel shirt with her hand on her head.png",
        "011@a close photo of <token> wearing a black shirt.png",
        "012@a close photo of <token> smirking wearing a gray hooded sweatshirt.png",
        "013@a photo of <token> standing in front of a desk.png",
        "014@a close photo of <token> standing in a kitchen.png",
        "015@a photo of <token> wearing a pink sweater with her hand on her forehead sitting on a couch with leaves in the background.png",
        "016@a photo of <token> wearing a black shirt standing in front of a door.png",
        "017@a photo of <token> smiling wearing a black v-neck sweater sitting on a couch in front of a lamp.png",
        "019@a photo of <token> wearing a blue v-neck shirt in front of a door.png",
        "020@a photo of <token> looking down with her hand on her face wearing a black sweater.png",
        "021@a close photo of <token> pursing her lips wearing a pink hooded sweatshirt.png",
        "022@a photo of <token> looking off into the distance wearing a striped shirt.png",
        "023@a photo of <token> smiling wearing a blue beanie holding a wine glass with a kitchen table in the background.png",
        "024@a close photo of <token> looking at the camera.png"
    ],
    "training_images_count": 24,
    "training_images_folder_path": "D:\\stable-diffusion\\training_images\\24 Images - captioned"
}

Using your configuration for training

python "main.py" --config_file_path "path/to/the/my-config.json"

Command Line Parameters

dreambooth_helpers\arguments.py

Command Type Example Description
--config_file_path string "C:\\Users\\David\\Dreambooth Configs\\my-config.json" The path the configuration file to use
--project_name string "My Project Name" Name of the project
--debug bool False Optional Defaults to False. Enable debug logging
--seed int 23 Optional Defaults to 23. Seed for seed_everything
--max_training_steps int 3000 Number of training steps to run
--token string "owhx" Unique token you want to represent your trained model.
--token_only bool False Optional Defaults to False. Train only using the token and no class.
--training_model string "D:\\stable-diffusion\\models\\v1-5-pruned-emaonly-pruned.ckpt" Path to model to train (model.ckpt)
--training_images string "D:\\stable-diffusion\\training_images\\24 Images - captioned" Path to training images directory
--regularization_images string "D:\\stable-diffusion\\regularization_images\\Stable-Diffusion-Regularization-Images-person_ddim\\person_ddim" Path to directory with regularization images
--class_word string "woman" Match class_word to the category of images you want to train. Example: man, woman, dog, or artstyle.
--flip_p float 0.0 Optional Defaults to 0.5. Flip Percentage. Example: if set to 0.5, will flip (mirror) your training images 50% of the time. This helps expand your dataset without needing to include more training images. This can lead to worse results for face training since most people's faces are not perfectly symmetrical.
--learning_rate float 1.0e-06 Optional Defaults to 1.0e-06 (0.000001). Set the learning rate. Accepts scientific notation.
--save_every_x_steps int 250 Optional Defaults to 0. Saves a checkpoint every x steps. At 0 only saves at the end of training when max_training_steps is reached.
--gpu int 0 Optional Defaults to 0. Specify a GPU other than 0 to use for training. Multi-GPU support is not currently implemented.

Using your configuration for training

python "main.py" --project_name "My Project Name" --max_training_steps 3000 --token "owhx" --training_model "D:\\stable-diffusion\\models\\v1-5-pruned-emaonly-pruned.ckpt" --training_images "D:\\stable-diffusion\\training_images\\24 Images - captioned" --regularization_images "D:\\stable-diffusion\\regularization_images\\Stable-Diffusion-Regularization-Images-person_ddim\\person_ddim" --class_word "woman" --flip_p 0.0 --save_every_x_steps 500

Captions and Multiple Subject/Concept Support

Captions are supported. Here is the guide on how we implemented them.

Let's say that your token is effy and your class is person, your data root is /train then:

training_images/img-001.jpg is captioned with effy person

You can customize the captioning by adding it after a @ symbol in the filename.

/training_images/img-001@a photo of effy => a photo of effy

You can use two tokens in your captions S - uppercase S - and C - uppercase C - to indicate subject and class.

/training_images/img-001@S being a good C.jpg => effy being a good person

To create a new subject you just need to create a folder for it. So:

/training_images/bingo/img-001.jpg => bingo person

The class stays the same, but now the subject has changed.

Again - the token S is now bingo:

/training_images/bingo/img-001@S is being silly.jpg => bingo is being silly

One folder deeper and you can change the class: /training_images/bingo/dog/img-001@S being a good C.jpg => bingo being a good dog

No comes the kicker: one level deeper and you can caption group of images: /training_images/effy/person/a picture of/img-001.jpg => a picture of effy person

Textual Inversion vs. Dreambooth

The majority of the code in this repo was written by Rinon Gal et. al, the authors of the Textual Inversion research paper. Though a few ideas about regularization images and prior loss preservation (ideas from "Dreambooth") were added in, out of respect to both the MIT team and the Google researchers, I'm renaming this fork to: "The Repo Formerly Known As "Dreambooth"".

For an alternate implementation , please see "Alternate Option" below.

Using the generated model

The ground truth (real picture, caution: very beautiful woman)

Same prompt for all of these images below:

sks person woman person Natalie Portman person Kate Mara person

Debugging your results

❗❗ THE NUMBER ONE MISTAKE PEOPLE MAKE ❗❗

Prompting with just your token. ie "joepenna" instead of "joepenna person"

If you trained with joepenna under the class person, the model should only know your face as:

joepenna person

Example Prompts:

🚫 Incorrect (missing person following joepenna)

portrait photograph of joepenna 35mm film vintage glass

✅ This is right (person is included after joepenna)

portrait photograph of joepenna person 35mm film vintage glass

You might sometimes get someone who kinda looks like you with joepenna (especially if you trained for too many steps), but that's only because this current iteration of Dreambooth overtrains that token so much that it bleeds into that token.


☢ Be careful with the types of images you train

While training, Stable doesn't know that you're a person. It's just going to mimic what it sees.

So, if these are your training images look like this:

You're only going to get generations of you outside next to a spiky tree, wearing a white-and-gray shirt, in the style of... well, selfie photograph.

Instead, this training set is much better:

The only thing that is consistent between images is the subject. So, Stable will look through the images and learn only your face, which will make "editing" it into other styles possible.

Oh no! You're not getting good generations!

OPTION 1: They're not looking like you at all! (Train longer, or get better training images)

Are you sure you're prompting it right?

It should be <token> <class>, not just <token>. For example:

JoePenna person, portrait photograph, 85mm medium format photo

If it still doesn't look like you, you didn't train long enough.


OPTION 2: They're looking like you, but are all looking like your training images. (Train for less steps, get better training images, fix with prompting)

Okay, a few reasons why: you might have trained too long... or your images were too similar... or you didn't train with enough images.

No problem. We can fix that with the prompt. Stable Diffusion puts a LOT of merit to whatever you type first. So save it for later:

an exquisite portrait photograph, 85mm medium format photo of JoePenna person with a classic haircut


OPTION 3: They're looking like you, but not when you try different styles. (Train longer, get better training images)

You didn't train long enough...

No problem. We can fix that with the prompt:

JoePenna person in a portrait photograph, JoePenna person in a 85mm medium format photo of JoePenna person

More tips and help here: Stable Diffusion Dreambooth Discord

Hugging Face Diffusers - Alternate Option

Dreambooth is now supported in HuggingFace Diffusers for training with Stable Diffusion.

Try it out here:

Open In Colab