TheLastBen / fast-stable-diffusion

fast-stable-diffusion + DreamBooth
MIT License
7.48k stars 1.3k forks source link

Dreambooth training does not create checkpoints? #2032

Open athenawisdoms opened 1 year ago

athenawisdoms commented 1 year ago

Hello there!

I'm using TheLastBen version of Dreambooth to train on a set of 10 images on a RTX 3090 24GB on Ubuntu with nvidia driver 525.105.17 and CUDA 12.0.

export MODEL_NAME="runwayml/stable-diffusion-v1-5"
export INSTANCE_DIR="./images/foo"
export CLASS_DIR="./class/foo"
export OUTPUT_DIR="./output/foo"

accelerate launch train_dreambooth.py \
 --pretrained_model_name_or_path=$MODEL_NAME  \
 --train_text_encoder \
 --mixed_precision="fp16" \
  --instance_data_dir=$INSTANCE_DIR \
 --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
 --with_prior_preservation --prior_loss_weight=1.0 \
 --seed=1337 \
 --instance_prompt="photo of ohwx person" \
 --class_prompt="person" \
 --resolution=512 \
 --train_batch_size=1 \
 --gradient_accumulation_steps=1 --gradient_checkpointing \
 --use_8bit_adam \
 --learning_rate=1e-6 \
 --lr_scheduler="constant" \
 --lr_warmup_steps=0 \
 --num_class_images=200 \
 --max_train_steps=1100

Unforunately, the final checkpoints cannot be found in the output directory, which contains only tensorboard log files.

output/
└── foo
    └── logs
        └── dreambooth
            ├── 1682291762.1303725
            │   └── events.out.tfevents.1682291762.z-pc.4176312.1
            ├── 1682291762.1317098
            │   └── hparams.yml
            └── events.out.tfevents.1682291762.z-pc.4176312.0

Here's the output from the start till the end of the training:

/home/athenawisdoms/test/thelastben/venv/lib/python3.10/site-packages/accelerate/accelerator.py:249: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
==============================WARNING: DEPRECATED!==============================
WARNING! This version of bitsandbytes is deprecated. Please switch to `pip install bitsandbytes` and the new repo: https://github.com/TimDettmers/bitsandbytes
==============================WARNING: DEPRECATED!==============================
/home/athenawisdoms/test/thelastben/diffusers/src/diffusers/configuration_utils.py:214: FutureWarning: It is deprecated to pass a pretrained model name or path to `from_config`.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
  deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
  0%|                                                                                            | 0/1100 [00:00<?, ?it/s]/home/athenawisdoms/test/thelastben/venv/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:338: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  and inp.query.storage().data_ptr() == inp.key.storage().data_ptr()
/home/athenawisdoms/test/thelastben/venv/lib/python3.10/site-packages/bitsandbytes/functional.py:106: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
  else: return ct.c_void_p(A.data.storage().data_ptr())
Progress:|█████████████████████████|: 100%|██████████████████████| 1100/1100 [08:24<00:00,  2.18it/s, loss=0.256, lr=1e-6]

How do I find the final checkpoint files to do inference on? Thanks!

TheLastBen commented 1 year ago

That is not my notebook, this one is https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb

athenawisdoms commented 1 year ago

OH I'm so sorry! I pasted the wrong link, updated it to be https://github.com/TheLastBen/diffusers/blob/main/examples/dreambooth/train_dreambooth.py

Is there an example to perform the training without using the ipynb notebook? Wondering why my command above doesnt produce a checkpoint file after training is done.