[Bug]: No executable batch size found, reached zero.

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

What happened?

I am using SD for a while with 1650 Super, i know that it's not enough for the Sd. Although, i did create lots of images with it. But when i try to Train with Dreambooth that error happens. I tried a lot of things but couldn't help it. I don't think this is about the GPU because ı read some people manage to train with 1650 super.

Steps to reproduce the problem

I did create a model then i set the setting and dataset directory. And after all class images were created this error showed up.

Commit and libraries

[+] xformers version 0.0.17 installed. [+] torch version 2.0.1+cu118 installed. [+] torchvision version 0.15.2+cu118 installed. [+] accelerate version 0.19.0 installed. [+] diffusers version 0.16.1 installed. [+] transformers version 4.29.2 installed. [+] bitsandbytes version 0.35.4 installed.

Command Line Arguments

--medvram --opt-split-attention --precision full --no-half --xformers

Console logs

venv "C:\StableDif\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: v1.3.2
Commit hash: baf6946e06249c5af9851c60171692c44ef633e0
Installing requirements
If submitting an issue on github, please provide the full startup log for debugging purposes.

Initializing Dreambooth
Dreambooth revision: dc413a14379b165355502d9f65856c40a4bb5b6f
Successfully installed accelerate-0.19.0 fastapi-0.94.1 gitpython-3.1.31 transformers-4.29.2

Does your project take forever to startup?
Repetitive dependency installation may be the reason.
Automatic1111's base project sets strict requirements on outdated dependencies.
If an extension is using a newer version, the dependency is uninstalled and reinstalled twice every startup.

[+] xformers version 0.0.17 installed.
[+] torch version 2.0.1+cu118 installed.
[+] torchvision version 0.15.2+cu118 installed.
[+] accelerate version 0.19.0 installed.
[+] diffusers version 0.16.1 installed.
[+] transformers version 4.29.2 installed.
[+] bitsandbytes version 0.35.4 installed.

Launching Web UI with arguments: --medvram --opt-split-attention --precision full --no-half --xformers
Loading weights [c249d7853b] from C:\StableDif\stable-diffusion-webui\models\Stable-diffusion\dreamshaper_6BakedVae.safetensors
Creating model from config: C:\StableDif\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Textual inversion embeddings loaded(0):
Model loaded in 1.4s (load weights from disk: 0.2s, create model: 0.4s, apply weights to model: 0.7s).
Applying optimization: xformers... done.
CUDA SETUP: Loading binary C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 9.3s (import torch: 2.0s, import gradio: 1.3s, import ldm: 0.6s, other imports: 1.1s, load scripts: 3.5s, create ui: 0.6s, gradio launch: 0.1s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:22<00:00,  1.12s/it]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:21<00:00,  1.10s/it]
Using CPU for extraction.██████████████████████████████████████████████████████████████| 20/20 [00:21<00:00,  1.41it/s]
Loading model from checkpoint.
Loading safetensors...
Pred and size are epsilon and 512, using config: C:\StableDif\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\..\configs\v1-training-default.yaml
v1 model loaded.
Trying to load: C:\StableDif\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\..\configs\v1-training-default.yaml
Converting unet...
Converting vae...
Converting text encoder...
Saving text_encoder
Saving tokenizer
Checkpoint successfully extracted to C:\StableDif\stable-diffusion-webui\models\dreambooth\yigitxs11\working
Duration: 00:01:57
Updating scheduler name to: DDIM
Wizard results:<br>Num Epochs: 150<br>Num instance images per class image: 5
Wizard results:<br>Num Epochs: 150<br>Num instance images per class image: 5
Initializing dreambooth training...
Pre-processing images: classifiers_0: : 16it [00:00, 3199.47it/s]
We need a total of 40 class images.0: : 16it [00:00, 3999.57it/s]                        | 0/8 [00:00<?, ?it/s]
Generating 40 class images for training...
                                                                                                               Using scheduler: DEISMultistep:   0%|                                                    | 0/40 [00:00<?, ?it/s]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:39<00:00,  2.49s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:28<00:00,  2.22s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:28<00:00,  2.22s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:28<00:00,  2.22s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:28<00:00,  2.22s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.18s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:26<00:00,  2.16s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:25<00:00,  2.15s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:25<00:00,  2.15s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:25<00:00,  2.14s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:25<00:00,  2.14s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:25<00:00,  2.14s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:25<00:00,  2.14s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.19s/it]
100%|██████████████████████████████████████████████████████████████████████████| 40/40 [01:27<00:00,  2.20s/it]
Generating class images 39/40::  98%|████████████████████████████████████████████████▊ | 39/40 [00:00<?, ?it/s]Generated 40 new class images.:: 100%|██████████████████████████████████████████████████| 40/40 [00:00<?, ?it/s]
                                                                                                               Enabling xformers memory efficient attention for unet                                    | 0/40 [00:00<?, ?it/s]
Enabling xformers memory efficient attention for unet
                                                                                                               Found 40 reg images.0%|                                                                  | 0/40 [00:00<?, ?it/s]
Preparing dataset...
Init dataset!
Preparing Dataset (With Caching)
Bucket 0 (512, 512, 0) - Instance Images: 8 | Class Images: 40 | Max Examples/batch: 16
                                                                                                               Saving cache!nts...: 100%|██████████████████████████████████████████████████████| 48/48 [01:09<00:00,  1.34s/it]
Total Buckets 1 - Instance Images: 8 | Class Images: 40 | Max Examples/batch: 16

Total images / batch: 16, total examples: 16███████████████████████████████████| 48/48 [01:09<00:00,  1.34s/it]
                                                                                                               Total dataset length (steps): 16
                  Initializing bucket counter!
OOM Detected, reducing batch/grad size to 0/1.
Traceback (most recent call last):
  File "C:\StableDif\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 119, in decorator
    return function(batch_size, grad_size, prof, *args, **kwargs)
  File "C:\StableDif\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 640, in inner_loop
    unet, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 1143, in prepare
    result = tuple(
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 1144, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 995, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\accelerate\accelerator.py", line 1218, in prepare_model
    model = model.to(self.device)
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1145, in to    return self._apply(convert)
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 5 more times]
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "C:\StableDif\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 26.00 MiB (GPU 0; 4.00 GiB total capacity; 3.37 GiB already allocated; 0 bytes free; 3.41 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "C:\StableDif\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\ui_functions.py", line 729, in start_training
    result = main(class_gen_method=class_gen_method)
  File "C:\StableDif\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 1548, in main
    return inner_loop()
  File "C:\StableDif\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 117, in decorator
    raise RuntimeError("No executable batch size found, reached zero.")
RuntimeError: No executable batch size found, reached zero.
0it [00:08, ?it/s]
Duration: 01:14:24

Additional information

No response

d8ahazard / sd_dreambooth_extension