d8ahazard / sd_dreambooth_extension

Other
1.86k stars 283 forks source link

The "No executable batch size found, reached zero." still haunts me #852

Closed Java197 closed 1 year ago

Java197 commented 1 year ago

Kindly read the entire form below and fill it out with the requested information.

Please find the following lines in the console and paste them below. If you do not provide this information, your issue will be automatically closed.

` Python revision: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)] Dreambooth revision: 9f4d931a319056c537d24669cb950d146d1537b0 SD-WebUI revision: 48a15821de768fea76e66f26df83df3fddf18f4b

Checking Dreambooth requirements... [+] bitsandbytes version 0.35.0 installed. [+] diffusers version 0.10.2 installed. [+] transformers version 4.25.1 installed. [ ] xformers version N/A installed. [+] torch version 1.13.1+cu117 installed. [+] torchvision version 0.14.1+cu117 installed. `

Have you read the Readme? yes Have you completely restarted the stable-diffusion-webUI, not just reloaded the UI? yes Have you updated Dreambooth to the latest revision? yes Have you updated the Stable-Diffusion-WebUI to the latest version? yes No, really. Please save us both some trouble and update the SD-WebUI and Extension and restart before posting this. Reply 'OK' Below to acknowledge that you did this. ok Describe the bug

Training is still producing an error for me. I have ensured everything in Stable diffusion is up to date and that Dreambooth is up to date. I'm still getting the error I have been having for the last 2 weeks, but now my Cuda out-of-memory error is on display this time with it. No matter what parameter I set for training, (I got ridiculous as to how few resources it should use) It still runs out of memory. I swear this used to work before.

Provide logs /////////////////////////////////////////////////////////////////////////////////////////////////// Total images: 197 Largest prime: 197 Best factors: (1, 197) Total VRAM: 6 Initializing dreambooth training... Pre-processing Sample: 100%|███████████████████████████████████████████████████████████| 197/197 [00:02<00:00, 92.92it/s] Concept requires 0 class images per instance image. Sorting instance images: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s] Nothing to generate. Replace CrossAttention.forward to use FlashAttention CUDA SETUP: Loading binary E:\AI\stable-diffusion-Filter\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll... Found 0 reg images. Preparing dataset... Init dataset! Preparing Dataset (With Caching) Bucket 0 (36, 100, 0) - Instance Images: 38 | Class Images: 0 | Max Examples/batch: 38 Bucket 1 (100, 36, 0) - Instance Images: 22 | Class Images: 0 | Max Examples/batch: 22 Bucket 2 (100, 100, 0) - Instance Images: 137 | Class Images: 0 | Max Examples/batch: 137 Total Buckets 3 - Instance Images: 197 | Class Images: 0 | Max Examples/batch: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:12<00:00, 36.87it/s] Total images / batch: 197, total examples: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:12<00:00, 16.41it/s] Total dataset length (steps): 197 Initializing bucket counter! Running training Num batches each epoch = 1 Num Epochs = 100 Batch Size Per Device = 197 Gradient Accumulation steps = 1 Total train batch size (w. parallel, distributed & accumulation) = 197 Text Encoder Epochs: 100 Total optimization steps = 19700 Total training steps = 19700 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: True, Adam: True, Prec: fp16 Gradient Checkpointing: False EMA: False UNET: False Freeze CLIP Normalization Layers: False LR: 0.0002 LoRA Text Encoder LR: 0.0002 V2: False Steps: 0%| | 0/19700 [00:00<?, ?it/s]Removing log: E:\AI\stable-diffusion-Filter\models\dreambooth\Sample Training\logging\db_log_2023-01-24-22-13-22.log OOM Detected, reducing batch/grad size to 98/1. Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 121, in decorator return function(batch_size, grad_size, prof, log_file, *args, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 860, in inner_loop encoder_hidden_states = encode_hidden_state(text_encoder, batch["input_ids"], pad_tokens, File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\finetune_utils.py", line 1020, in encode_hidden_state encoder_hidden_states = text_encoder(input_ids)[0] File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(*args, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 811, in forward return self.text_model( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 721, in forward encoder_outputs = self.encoder( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 650, in forward layer_outputs = encoder_layer( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 379, in forward hidden_states, attn_weights = self.self_attn( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 293, in forward attn_weights = attn_weights.view(bsz, self.num_heads, tgt_len, src_len) + causal_attention_mask torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 54.00 MiB (GPU 0; 6.00 GiB total capacity; 5.11 GiB already allocated; 0 bytes free; 5.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Steps: 0%| | 0/19700 [00:01<?, ?it/s] Pre-processing Sample: 100%|███████████████████████████████████████████████████████████| 197/197 [00:02<00:00, 91.78it/s] Concept requires 0 class images per instance image. Sorting instance images: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s] Nothing to generate. Replace CrossAttention.forward to use FlashAttention Found 0 reg images. Preparing dataset... Init dataset! Preparing Dataset (With Caching) Bucket 0 (36, 100, 0) - Instance Images: 38 | Class Images: 0 | Max Examples/batch: 38 Bucket 1 (100, 36, 0) - Instance Images: 22 | Class Images: 0 | Max Examples/batch: 22 Bucket 2 (100, 100, 0) - Instance Images: 137 | Class Images: 0 | Max Examples/batch: 137 Total Buckets 3 - Instance Images: 197 | Class Images: 0 | Max Examples/batch: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:10<00:00, 17.87it/s] Total images / batch: 197, total examples: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:10<00:00, 19.50it/s] Total dataset length (steps): 197 Initializing bucket counter! Running training ** Num batches each epoch = 2 Num Epochs = 100 Batch Size Per Device = 98 Gradient Accumulation steps = 1 Total train batch size (w. parallel, distributed & accumulation) = 98 Text Encoder Epochs: 100 Total optimization steps = 19700 Total training steps = 19700 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: True, Adam: True, Prec: fp16 Gradient Checkpointing: False EMA: False UNET: False Freeze CLIP Normalization Layers: False LR: 0.0002 LoRA Text Encoder LR: 0.0002 V2: False Steps: 0%| | 0/19700 [00:00<?, ?it/s]Removing log: E:\AI\stable-diffusion-Filter\models\dreambooth\Sample Training\logging\db_log_2023-01-24-22-13-43.log OOM Detected, reducing batch/grad size to 49/1. Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 121, in decorator return function(batch_size, grad_size, prof, log_file, args, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 860, in inner_loop encoder_hidden_states = encode_hidden_state(text_encoder, batch["input_ids"], pad_tokens, File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\finetune_utils.py", line 1020, in encode_hidden_state encoder_hidden_states = text_encoder(input_ids)[0] File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(args, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 811, in forward return self.text_model( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 721, in forward encoder_outputs = self.encoder( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 650, in forward layer_outputs = encoder_layer( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 389, in forward hidden_states = self.mlp(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 345, in forward hidden_states = self.activation_fn(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\activations.py", line 75, in forward return input torch.sigmoid(1.702 * input) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 46.00 MiB (GPU 0; 6.00 GiB total capacity; 5.14 GiB already allocated; 0 bytes free; 5.26 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Steps: 0%| | 0/19700 [00:00<?, ?it/s] Pre-processing Sample: 100%|███████████████████████████████████████████████████████████| 197/197 [00:02<00:00, 92.71it/s] Concept requires 0 class images per instance image. Sorting instance images: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s] Nothing to generate. Replace CrossAttention.forward to use FlashAttention Found 0 reg images. Preparing dataset... Init dataset! Preparing Dataset (With Caching) Bucket 0 (36, 100, 0) - Instance Images: 38 | Class Images: 0 | Max Examples/batch: 38 Bucket 1 (100, 36, 0) - Instance Images: 22 | Class Images: 0 | Max Examples/batch: 22 Bucket 2 (100, 100, 0) - Instance Images: 137 | Class Images: 0 | Max Examples/batch: 137 Total Buckets 3 - Instance Images: 197 | Class Images: 0 | Max Examples/batch: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 53.78it/s] Total images / batch: 197, total examples: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 20.12it/s] Total dataset length (steps): 197 Initializing bucket counter! *** Running training ** Num batches each epoch = 4 Num Epochs = 100 Batch Size Per Device = 49 Gradient Accumulation steps = 1 Total train batch size (w. parallel, distributed & accumulation) = 49 Text Encoder Epochs: 100 Total optimization steps = 19700 Total training steps = 19700 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: True, Adam: True, Prec: fp16 Gradient Checkpointing: False EMA: False UNET: False Freeze CLIP Normalization Layers: False LR: 0.0002 LoRA Text Encoder LR: 0.0002 V2: False Steps: 0%| | 0/19700 [00:00<?, ?it/s]Removing log: E:\AI\stable-diffusion-Filter\models\dreambooth\Sample Training\logging\db_log_2023-01-24-22-14-02.log OOM Detected, reducing batch/grad size to 24/1. Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 121, in decorator return function(batch_size, grad_size, prof, log_file, args, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 860, in inner_loop encoder_hidden_states = encode_hidden_state(text_encoder, batch["input_ids"], pad_tokens, File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\finetune_utils.py", line 1020, in encode_hidden_state encoder_hidden_states = text_encoder(input_ids)[0] File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(args, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 811, in forward return self.text_model( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 721, in forward encoder_outputs = self.encoder( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 650, in forward layer_outputs = encoder_layer( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 389, in forward hidden_states = self.mlp(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 345, in forward hidden_states = self.activation_fn(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\transformers\activations.py", line 75, in forward return input torch.sigmoid(1.702 * input) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 6.00 GiB total capacity; 5.06 GiB already allocated; 0 bytes free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Steps: 0%| | 0/19700 [00:00<?, ?it/s] Pre-processing Sample: 100%|███████████████████████████████████████████████████████████| 197/197 [00:02<00:00, 91.08it/s] Concept requires 0 class images per instance image. Sorting instance images: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s] Nothing to generate. Replace CrossAttention.forward to use FlashAttention Found 0 reg images. Preparing dataset... Init dataset! Preparing Dataset (With Caching) Bucket 0 (36, 100, 0) - Instance Images: 38 | Class Images: 0 | Max Examples/batch: 38 Bucket 1 (100, 36, 0) - Instance Images: 22 | Class Images: 0 | Max Examples/batch: 22 Bucket 2 (100, 100, 0) - Instance Images: 137 | Class Images: 0 | Max Examples/batch: 137 Total Buckets 3 - Instance Images: 197 | Class Images: 0 | Max Examples/batch: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:10<00:00, 33.23it/s] Total images / batch: 197, total examples: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:10<00:00, 19.63it/s] Total dataset length (steps): 197 Initializing bucket counter! *** Running training ** Num batches each epoch = 8 Num Epochs = 100 Batch Size Per Device = 24 Gradient Accumulation steps = 1 Total train batch size (w. parallel, distributed & accumulation) = 24 Text Encoder Epochs: 100 Total optimization steps = 19700 Total training steps = 19700 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: True, Adam: True, Prec: fp16 Gradient Checkpointing: False EMA: False UNET: False Freeze CLIP Normalization Layers: False LR: 0.0002 LoRA Text Encoder LR: 0.0002 V2: False Steps: 0%| | 0/19700 [00:00<?, ?it/s]Removing log: E:\AI\stable-diffusion-Filter\models\dreambooth\Sample Training\logging\db_log_2023-01-24-22-14-20.log OOM Detected, reducing batch/grad size to 12/1. Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 121, in decorator return function(batch_size, grad_size, prof, log_file, args, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 864, in inner_loop noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(*args, *kwargs)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(args, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 381, in forward sample, res_samples = downsample_block( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 618, in forward hidden_states = downsampler(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\resnet.py", line 188, in forward hidden_states = self.conv(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\extensions-builtin\Lora\lora.py", line 179, in lora_Conv2d_forward return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 6.00 GiB total capacity; 5.22 GiB already allocated; 0 bytes free; 5.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Steps: 0%| | 0/19700 [00:00<?, ?it/s] Pre-processing Sample: 100%|███████████████████████████████████████████████████████████| 197/197 [00:02<00:00, 92.82it/s] Concept requires 0 class images per instance image. Sorting instance images: 100%|█████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 3034.95it/s] Nothing to generate. Replace CrossAttention.forward to use FlashAttention Found 0 reg images. Preparing dataset... Init dataset! Preparing Dataset (With Caching) Bucket 0 (36, 100, 0) - Instance Images: 38 | Class Images: 0 | Max Examples/batch: 38 Bucket 1 (100, 36, 0) - Instance Images: 22 | Class Images: 0 | Max Examples/batch: 22 Bucket 2 (100, 100, 0) - Instance Images: 137 | Class Images: 0 | Max Examples/batch: 137 Total Buckets 3 - Instance Images: 197 | Class Images: 0 | Max Examples/batch: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 20.15it/s] Total images / batch: 197, total examples: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 19.72it/s] Total dataset length (steps): 197 Initializing bucket counter! Running training Num batches each epoch = 16 Num Epochs = 100 Batch Size Per Device = 12 Gradient Accumulation steps = 1 Total train batch size (w. parallel, distributed & accumulation) = 12 Text Encoder Epochs: 100 Total optimization steps = 19700 Total training steps = 19700 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: True, Adam: True, Prec: fp16 Gradient Checkpointing: False EMA: False UNET: False Freeze CLIP Normalization Layers: False LR: 0.0002 LoRA Text Encoder LR: 0.0002 V2: False Steps: 0%| | 0/19700 [00:00<?, ?it/s]Removing log: E:\AI\stable-diffusion-Filter\models\dreambooth\Sample Training\logging\db_log_2023-01-24-22-14-40.log OOM Detected, reducing batch/grad size to 6/1. Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 121, in decorator return function(batch_size, grad_size, prof, log_file, *args, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 864, in inner_loop noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(*args, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 415, in forward sample = upsample_block( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1277, in forward hidden_states = resnet(hidden_states, temb) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\resnet.py", line 464, in forward hidden_states = self.conv1(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\extensions-builtin\Lora\lora.py", line 179, in lora_Conv2d_forward return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 6.00 GiB total capacity; 5.22 GiB already allocated; 0 bytes free; 5.25 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Steps: 0%| | 0/19700 [00:00<?, ?it/s] Pre-processing Sample: 100%|███████████████████████████████████████████████████████████| 197/197 [00:02<00:00, 93.05it/s] Concept requires 0 class images per instance image. Sorting instance images: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s] Nothing to generate. Replace CrossAttention.forward to use FlashAttention Found 0 reg images. Preparing dataset... Init dataset! Preparing Dataset (With Caching) Bucket 0 (36, 100, 0) - Instance Images: 38 | Class Images: 0 | Max Examples/batch: 38 Bucket 1 (100, 36, 0) - Instance Images: 22 | Class Images: 0 | Max Examples/batch: 22 Bucket 2 (100, 100, 0) - Instance Images: 137 | Class Images: 0 | Max Examples/batch: 137 Total Buckets 3 - Instance Images: 197 | Class Images: 0 | Max Examples/batch: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 33.62it/s] Total images / batch: 197, total examples: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 19.86it/s] Total dataset length (steps): 197 Initializing bucket counter! ** Running training Num batches each epoch = 32 Num Epochs = 100 Batch Size Per Device = 6 Gradient Accumulation steps = 1 Total train batch size (w. parallel, distributed & accumulation) = 6 Text Encoder Epochs: 100 Total optimization steps = 19700 Total training steps = 19700 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: True, Adam: True, Prec: fp16 Gradient Checkpointing: False EMA: False UNET: False Freeze CLIP Normalization Layers: False LR: 0.0002 LoRA Text Encoder LR: 0.0002 V2: False Steps: 0%| | 0/19700 [00:00<?, ?it/s]Removing log: E:\AI\stable-diffusion-Filter\models\dreambooth\Sample Training\logging\db_log_2023-01-24-22-14-59.log OOM Detected, reducing batch/grad size to 3/1. Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 121, in decorator return function(batch_size, grad_size, prof, log_file, *args, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 864, in inner_loop noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(*args, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 407, in forward sample = upsample_block( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1202, in forward hidden_states = resnet(hidden_states, temb) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\resnet.py", line 474, in forward hidden_states = self.conv2(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\extensions-builtin\Lora\lora.py", line 179, in lora_Conv2d_forward return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 30.00 MiB (GPU 0; 6.00 GiB total capacity; 5.23 GiB already allocated; 0 bytes free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Steps: 0%| | 0/19700 [00:00<?, ?it/s] Pre-processing Sample: 100%|███████████████████████████████████████████████████████████| 197/197 [00:02<00:00, 92.73it/s] Concept requires 0 class images per instance image. Sorting instance images: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s] Nothing to generate. Replace CrossAttention.forward to use FlashAttention Found 0 reg images. Preparing dataset... Init dataset! Preparing Dataset (With Caching) Bucket 0 (36, 100, 0) - Instance Images: 38 | Class Images: 0 | Max Examples/batch: 38 Bucket 1 (100, 36, 0) - Instance Images: 22 | Class Images: 0 | Max Examples/batch: 22 Bucket 2 (100, 100, 0) - Instance Images: 137 | Class Images: 0 | Max Examples/batch: 137 Total Buckets 3 - Instance Images: 197 | Class Images: 0 | Max Examples/batch: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 26.92it/s] Total images / batch: 197, total examples: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 19.89it/s] Total dataset length (steps): 197 Initializing bucket counter! ** Running training Num batches each epoch = 65 Num Epochs = 100 Batch Size Per Device = 3 Gradient Accumulation steps = 1 Total train batch size (w. parallel, distributed & accumulation) = 3 Text Encoder Epochs: 100 Total optimization steps = 19700 Total training steps = 19700 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: True, Adam: True, Prec: fp16 Gradient Checkpointing: False EMA: False UNET: False Freeze CLIP Normalization Layers: False LR: 0.0002 LoRA Text Encoder LR: 0.0002 V2: False Steps: 0%| | 0/19700 [00:00<?, ?it/s]Removing log: E:\AI\stable-diffusion-Filter\models\dreambooth\Sample Training\logging\db_log_2023-01-24-22-15-18.log OOM Detected, reducing batch/grad size to 1/1. Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 121, in decorator return function(batch_size, grad_size, prof, log_file, *args, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 864, in inner_loop noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(*args, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 407, in forward sample = upsample_block( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1202, in forward hidden_states = resnet(hidden_states, temb) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\resnet.py", line 464, in forward hidden_states = self.conv1(hidden_states) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\extensions-builtin\Lora\lora.py", line 179, in lora_Conv2d_forward return lora_forward(self, input, torch.nn.Conv2d_forward_before_lora(self, input)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward return self._conv_forward(input, self.weight, self.bias) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward return F.conv2d(input, weight, bias, self.stride, torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 6.00 GiB total capacity; 5.22 GiB already allocated; 0 bytes free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Steps: 0%| | 0/19700 [00:00<?, ?it/s] Pre-processing Sample: 100%|███████████████████████████████████████████████████████████| 197/197 [00:02<00:00, 92.62it/s] Concept requires 0 class images per instance image. Sorting instance images: 100%|███████████████████████████████████████████████████████████████████| 3/3 [00:00<?, ?it/s] Nothing to generate. Replace CrossAttention.forward to use FlashAttention Found 0 reg images. Preparing dataset... Init dataset! Preparing Dataset (With Caching) Bucket 0 (36, 100, 0) - Instance Images: 38 | Class Images: 0 | Max Examples/batch: 38 Bucket 1 (100, 36, 0) - Instance Images: 22 | Class Images: 0 | Max Examples/batch: 22 Bucket 2 (100, 100, 0) - Instance Images: 137 | Class Images: 0 | Max Examples/batch: 137 Total Buckets 3 - Instance Images: 197 | Class Images: 0 | Max Examples/batch: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 20.06it/s] Total images / batch: 197, total examples: 197 Caching latents...: 100%|████████████████████████████████████████████████████████████| 197/197 [00:09<00:00, 20.30it/s] Total dataset length (steps): 197 Initializing bucket counter! ** Running training Num batches each epoch = 197 Num Epochs = 100 Batch Size Per Device = 1 Gradient Accumulation steps = 1 Total train batch size (w. parallel, distributed & accumulation) = 1 Text Encoder Epochs: 100 Total optimization steps = 19700 Total training steps = 19700 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: True, Adam: True, Prec: fp16 Gradient Checkpointing: False EMA: False UNET: False Freeze CLIP Normalization Layers: False LR: 0.0002 LoRA Text Encoder LR: 0.0002 V2: False Steps: 0%| | 0/19700 [00:00<?, ?it/s]Removing log: E:\AI\stable-diffusion-Filter\models\dreambooth\Sample Training\logging\db_log_2023-01-24-22-15-36.log OOM Detected, reducing batch/grad size to 0/1. Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 121, in decorator return function(batch_size, grad_size, prof, log_file, *args, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 864, in inner_loop noise_pred = unet(noisy_latents, timesteps, encoder_hidden_states).sample File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\amp\autocast_mode.py", line 14, in decorate_autocast return func(*args, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_condition.py", line 407, in forward sample = upsample_block( File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 1203, in forward hidden_states = attn(hidden_states, encoder_hidden_states=encoder_hidden_states).sample File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\attention.py", line 216, in forward hidden_states = block(hidden_states, context=encoder_hidden_states, timestep=timestep) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, kwargs) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\diffusers\models\attention.py", line 491, in forward hidden_states = self.attn2(norm_hidden_states, context=context) + hidden_states File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(*input, *kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\xattention.py", line 273, in forward_flash_attn out = self.to_out0 File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, kwargs) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\lora_diffusion\lora.py", line 31, in forward return self.linear(input) + self.lora_up(self.lora_down(input)) self.scale File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl return forward_call(input, **kwargs) File "E:\AI\stable-diffusion-Filter\extensions-builtin\Lora\lora.py", line 175, in lora_Linear_forward return lora_forward(self, input, torch.nn.Linear_forward_before_lora(self, input)) File "E:\AI\stable-diffusion-Filter\venv\lib\site-packages\torch\nn\modules\linear.py", line 114, in forward return F.linear(input, self.weight, self.bias) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.25 GiB already allocated; 0 bytes free; 5.29 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF Steps: 0%| | 0/19700 [00:01<?, ?it/s] Traceback (most recent call last): File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\scripts\dreambooth.py", line 455, in start_training result = main(config, use_txt2img=use_txt2img) File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 989, in main return inner_loop() File "E:\AI\stable-diffusion-Filter\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 119, in decorator raise RuntimeError("No executable batch size found, reached zero.") RuntimeError: No executable batch size found, reached zero. Restored system models.

//////////////////////////////////////////////////////////////// If a crash has occurred, please provide the entire stack trace from the log, including the last few log messages before the crash occurred.

Environment

What OS? Windows 11 If Windows - WSL or native? native What GPU are you using? RTX 2060 Screenshots/Config If the issue is specific to an error while training, please provide a screenshot of training parameters or the db_config.json file from /models/dreambooth/MODELNAME/db_config.json

troed commented 1 year ago

Can't say much more than seeing this too. Backing up to a commit from Dec (I think) works, matching Dreambooth and Automatic1111. That version can't do v2.1 though, so using v1.4. When upgrading to the latest versions v2.1 OOMs immediately, v1.4 runs along for a bit before it OOMs too.

Here's an example from as much memory minimization I could think of. Note that it's a 12GB card and I just can't see how these numbers means it's an actual OOM.

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 11.72 GiB total capacity; 5.47 GiB already allocated; 77.31 MiB free; 5.51 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps:   5%|███▏                                                           | 25/500 [00:00<00:06, 72.57it/s, loss=0.331, loss_avg=0.16, lr=2e-6, vram_usage=8.7]
Traceback (most recent call last):
  File "/home/troed/stable-diffusion-webui/extensions/sd_dreambooth_extension/scripts/dreambooth.py", line 455, in start_training
    result = main(config, use_txt2img=use_txt2img)
  File "/home/troed/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/train_dreambooth.py", line 989, in main
    return inner_loop()
  File "/home/troed/stable-diffusion-webui/extensions/sd_dreambooth_extension/dreambooth/memory.py", line 119, in decorator
    raise RuntimeError("No executable batch size found, reached zero.")
RuntimeError: No executable batch size found, reached zero.
DarkAlchy commented 1 year ago

I get it to gen class images and they look like this

image

RuntimeError: No executable batch size found, reached zero. is the error after it generated, or rather after it tried, the class images.

Zuxier commented 1 year ago

Training on 6gb vram is really pushing it, some commits might train it but they probably had other issues that allowed it in the first place. Right now training is how its should be, working properly, if new ways to make it lighter on the vram come out, they will be implemented.

Java197 commented 1 year ago

So am I just simply at the limits of my hardware and not a bug of the software that preventing from training anymore?

Zuxier commented 1 year ago

yeah, unless someone is able to train with 6GB atm. i think 8 is the minimum. Speed wise probably a google colab is faster tbh.

DarkAlchy commented 1 year ago

yeah, unless someone is able to train with 6GB atm. i think 8 is the minimum. Speed wise probably a google colab is faster tbh.

You can't get this to work on colab either as I tried last night and as it was finishing gradio lost connection (first time that ever happened) and it pretty much had crashed.

Zuxier commented 1 year ago

For colab i meant like Shivam Colab for example. The extension is not tested for colab support atm.

Xardous commented 1 year ago

Im running on a 10GB vram right now and im having the same issue, I tried reverting to a previous commit, but that gave me a different problem where none of the buttons in the dreambooth extension I would press would work (nothing would show up on the terminal).. not sure whats up with that.

DarkAlchy commented 1 year ago

For colab i meant like Shivam Colab for example. The extension is not tested for colab support atm.

The only thing I want out of this extension is the LoRA training on colab and it can't even do that it seems. I just want a damn LoRA trainer that works on colab, and you don't need to be a data/rocket scientist to use it (yeah, I am looking at you Kohya, )but it seems like that is asking for the moon. For DB I just use Shiv's directly or TLB (I prefer Shiv's) so I have no need of this extension outside of LoRA training.

troed commented 1 year ago

Just highlighting again that I also see this, with 12GB VRAM. I'm able to train on v1.5 with Text Encoder off, at a VRAM usage of ~6.6GB (with an initial top at 8.5). Adding Text Encoder back increases VRAM usage to a bit above 8 but then cannot successfully complete due to the mentioned error.

Other apps are using ~600MB of VRAM according to nvidia-smi so I can't get the numbers to add up.

@Xardous When you revert Dreambooth you need to revert Automatic1111 also to get the buttons to work.

drumrboy44 commented 1 year ago

I am also seeing this with 10GB VRAM (I have a 3080). It worked before. Something it screwed up.

SavvaI commented 1 year ago

I am seeing this with the 11Gb VRAM. Xformers, float16 precision, 8bit Adam all turned on. Switched from the 2.0 checkpoint to the 1.5 ckpt and it suddenly started working, consuming ~6gb out of 11gm VRAM, but again it falls down after running for exactly 1epoch with again the memory issue.

mahxds commented 1 year ago

I get same error with 24GB VRAM , but 8 bit off, fp16 off, xformers off.... since these really slow down the training :s

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days

Java197 commented 1 year ago

(Refresh)

sunsided commented 1 year ago

I used to have the same issues but switching to the dev branch (e.g. around d4178cdccdf2edb363a852803cb8cffb7f36266b) saved me.

For me, the main reason for switching was a conflict in xformers, GCC and the CUDA differences of 11.6 and 11.7, although I saw the memory issues too. Switching to dev resolved both right away, although some commits sometimes break it.

d8ahazard commented 1 year ago

This should be resolved with https://github.com/d8ahazard/sd_dreambooth_extension/releases/tag/1.0.11

Feel free to open a new issue if you're still having problems.

dr-formalyst commented 1 year ago

After updating Dreambooth, I now also get this message, it was working well before on a 10Gb 3080, on v1.5...

Tommy9toes commented 1 year ago

I'm on a 8GB 3060TI and I was unable to get around the out of memory issue without ticking the Settings->Use LORA box as well as setting Concepts->Number of Samples to Generate to zero on each concept I'm adding. I'm not recommending anything, just noting how I progressed.

Edit: Too many class prompts (8 seems okay) and trying to do more than 6 images per folder/concept also seems to hit the limit on my hardware.

ChrisNonyminus commented 1 year ago

I'm on a 8GB 3060TI and I was unable to get around the out of memory issue without ticking the Settings->Use LORA box as well as setting Concepts->Number of Samples to Generate to zero on each concept I'm adding. I'm not recommending anything, just noting how I progressed.

Edit: Too many class prompts (8 seems okay) and trying to do more than 6 images per folder/concept also seems to hit the limit on my hardware.

I get this problem even with LORA. 12GB 3060 btw.

fgordon-maker commented 1 year ago

Same problem here, when trying to train a LoRA. Envinronment: 24GB VRAM RTX A5000 on runpod.io. This happened while I had to disable "Gradient Checkpointing" because of issue 9442

mary4500 commented 1 year ago

orator raise RuntimeError("No executable batch size found, reached zero.") RuntimeError: No executable batch size found, reached zero. Restored system models. Duration: 00:00:57

novitae commented 1 year ago

For me, simply changing mixed precision from bf16 to fp16 (without forgetting to relaunch the webui) worked