d8ahazard / sd_dreambooth_extension

Other
1.86k stars 281 forks source link

[Bug]: SDXL KeyError: 'images' #1337

Closed Saduff closed 11 months ago

Saduff commented 1 year ago

Is there an existing issue for this?

What happened?

Attempting to train on the SDXL branch results in KeyError: 'images':

Traceback (most recent call last):
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 730, in start_training
    result = main(class_gen_method=class_gen_method)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1791, in main
    return inner_loop()
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1456, in inner_loop
    for step, batch in enumerate(train_dataloader):
  File "D:\stable-diffusion-webui\venv\lib\site-packages\accelerate\data_loader.py", line 394, in __iter__
    next_batch = next(dataloader_iter)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\dataloader.py", line 633, in __next__
    data = self._next_data()
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\dataloader.py", line 677, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 720, in collate_fn_sdxl
    pixel_values += [example["images"] for example in examples if example["is_class"]]
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 720, in <listcomp>
    pixel_values += [example["images"] for example in examples if example["is_class"]]
KeyError: 'images'

Steps to reproduce the problem

  1. Checkout the SDXL branch
  2. Press the Train button

Commit and libraries

Initializing Dreambooth
Dreambooth revision: d924532046781f81bab304d70827ab7ceb999811
Successfully installed accelerate-0.22.0 fastapi-0.94.1 gitpython-3.1.32 google-auth-oauthlib-1.0.0 transformers-4.30.2

[!] xformers version 0.0.20 installed.
[+] torch version 2.0.1+cu118 installed.
[+] torchvision version 0.15.2+cu118 installed.
[+] accelerate version 0.22.0 installed.
[+] diffusers version 0.20.1 installed.
[+] transformers version 4.30.2 installed.
[+] bitsandbytes version 0.35.4 installed.

Command Line Arguments

--no-half --xformers

Console logs

Launching Web UI with arguments: --no-half --xformers
Loading weights [e6bb9ea85b] from D:\stable-diffusion-webui\models\Stable-diffusion\sd_xl_base_1.0_0.9vae.safetensors
Creating model from config: D:\stable-diffusion-webui\repositories\generative-models\configs\inference\sd_xl_base.yaml
[2023-08-28 17:58:42,752][INFO][root] - Loaded ViT-bigG-14 model config.
Model loaded in 11.0s (load weights from disk: 1.2s, create model: 0.8s, apply weights to model: 3.2s, move model to device: 4.3s, load textual inversion embeddings: 0.8s, calculate empty prompt: 0.6s).
[2023-08-28 17:58:53,187][DEBUG][api.py] - SD-Webui API layer loaded
Applying attention optimization: xformers... done.
CUDA SETUP: Loading binary D:\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
Running on local URL:  http://127.0.0.1:7860
Initializing dreambooth training...
[2023-08-28 17:59:38,108][DEBUG][dreambooth.train_dreambooth] - Adding 'get_velocity' method to DEISMultistepScheduler...
[2023-08-28 17:59:38,109][DEBUG][dreambooth.train_dreambooth] - Adding 'get_velocity' method to UniPCMultistepScheduler...
[2023-08-28 17:59:38,292][DEBUG][dreambooth.train_dreambooth] - Pretrained path: D:\stable-diffusion-webui\models\dreambooth\myModel\working
Pre-processing images: man_4318_imgs_768x1024: : 4643it [00:06, 765.57it/s]
Nothing to generate.s: man_4318_imgs_768x1024:  77%|██████████████████████▎      | 3556/4630 [00:05<00:00, 7230.40it/s]
Found 1200 reg images.
Preparing dataset...
Init dataset!
Preparing Dataset (With Caching)
Loading cached latents...
Bucket 0 (832, 1248, 0) - Instance Images:  1 | Class Images:  100 | Max Examples/batch:    2
Bucket 1 (880, 1176, 0) - Instance Images:  9 | Class Images:  900 | Max Examples/batch:   18
Bucket 2 (1176, 880, 0) - Instance Images:  2 | Class Images:  200 | Max Examples/batch:    4
Saving cache!
Total Buckets 3 - Instance Images: 12 | Class Images: 1200 | Max Examples/batch:   24

Total images / batch: 24, total examples: 24
Total dataset length (steps): 24
Initializing bucket counter!
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   ***** Running training *****
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Num batches each epoch = 24
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Num Epochs = 200
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Batch Size Per Device = 1
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Gradient Accumulation steps = 1
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Total train batch size (w. parallel, distributed & accumulation) = 1
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Text Encoder Epochs: 150
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Total optimization steps = 2400
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Total training steps = 4800
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Resuming from checkpoint: False
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   First resume epoch: 0
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   First resume step: 0
[2023-08-28 18:07:37,920][DEBUG][dreambooth.train_dreambooth] -   Lora: False, Optimizer: Lion, Prec: bf16
[2023-08-28 18:07:37,921][DEBUG][dreambooth.train_dreambooth] -   Gradient Checkpointing: False
[2023-08-28 18:07:37,921][DEBUG][dreambooth.train_dreambooth] -   EMA: True
[2023-08-28 18:07:37,921][DEBUG][dreambooth.train_dreambooth] -   UNET: True
[2023-08-28 18:07:37,921][DEBUG][dreambooth.train_dreambooth] -   Freeze CLIP Normalization Layers: False
[2023-08-28 18:07:37,921][DEBUG][dreambooth.train_dreambooth] -   LR: 1e-07
[2023-08-28 18:07:37,921][DEBUG][dreambooth.train_dreambooth] -   Tenc LR: 1e-07
[2023-08-28 18:07:37,921][DEBUG][dreambooth.train_dreambooth] -   LoRA Extended: False
[2023-08-28 18:07:37,921][DEBUG][dreambooth.train_dreambooth] -   V2: False
Steps:   0%|                                                                                  | 0/4800 [00:00<?, ?it/s]Traceback (most recent call last):
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 730, in start_training
    result = main(class_gen_method=class_gen_method)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1791, in main
    return inner_loop()
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 126, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1456, in inner_loop
    for step, batch in enumerate(train_dataloader):
  File "D:\stable-diffusion-webui\venv\lib\site-packages\accelerate\data_loader.py", line 394, in __iter__
    next_batch = next(dataloader_iter)
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\dataloader.py", line 633, in __next__
    data = self._next_data()
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\dataloader.py", line 677, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "D:\stable-diffusion-webui\venv\lib\site-packages\torch\utils\data\_utils\fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 720, in collate_fn_sdxl
    pixel_values += [example["images"] for example in examples if example["is_class"]]
  File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 720, in <listcomp>
    pixel_values += [example["images"] for example in examples if example["is_class"]]
KeyError: 'images'
Steps:   0%|                                                                                  | 0/4800 [00:00<?, ?it/s]
Duration: 00:08:05
Restored system models.
Duration: 00:08:13


### Additional information

_No response_
Saduff commented 1 year ago

https://github.com/d8ahazard/sd_dreambooth_extension/blob/c43f76347a5aa82dcc61a0ba03fad3be981bd772/dreambooth/train_dreambooth.py#L720

It seems it should be example["image"] on that line instead of example["images"]. Making that change lets it continue to the next line where another KeyError occurs:

File "D:\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 721, in <listcomp>
    add_text_embeds += [example["class_added_cond_kwargs"]["text_embeds"] for example in examples if
KeyError: 'class_added_cond_kwargs'

Not sure if that should also be changed to instance_added_cond_kwargs like in the lines right above.

pipa0979 commented 1 year ago

@Saduff did that fix work?

antoninrichard commented 1 year ago

I have the same issue @Saduff do you find a way to fix it ?

Saduff commented 1 year ago

I haven't found a way to fix it. I didn't try to change it to instance_added_cond_kwargs as I'm not sure it's correct per the comment:

# Concat class and instance examples for prior preservation.

I'm not sure how it works for @d8ahazard.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

Saduff commented 11 months ago

Issue is still there on the latest commit (34690769c2cfb78d5760ba0003fdf55af0faf372).

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days