[Bug]: Error while Dreambooth training (RuntimeError: CUDA error: an illegal instruction was encountered)

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

When training, it gives Cuda errors

To create a public link, set share=True in launch(). Returning ['xformers', True, False, 1, '', '', 0.0, 60.0, 2, True, True, 50.0, False, False, 8e-06, 1e-06, 0.0002, '', 0.0002, 1, 1, 1, 0.5, 1, 0.5, 'constant_with_warmup', 0, 75, 'fp16', 1100, True, '', 1, 512, 5, '', 420420.0, True, True, True, 10, True, False, False, 5, False, False, False, True, 2, False, 0, True, False, False, False, True, 'C:\Users\meshm\Desktop\Dog', 7.5, 40, '', 'photo of a dog', '', 'C:\Users\meshm\Desktop\CORA_INPUT', 'photo of a CORA dog', '', 1, 0, 0, -1, 7.5, 40, '', '', '', '', 7.5, 60, '', '', '', '', '', '', 1, 0, 0, -1, 7.5, 60, '', '', '', '', 7.5, 60, '', '', '', '', '', '', 1, 0, 0, -1, 7.5, 60, '', '', '', 'Loaded config.'] Saved settings. Custom model name is Starting Dreambooth training... Initializing dreambooth training... Replace CrossAttention.forward to use xformers Instance Bucket 0: Resolution (512, 512), Count: 26 Target Bucket 0: Resolution (512, 512), Count: 0 We need a total of 0 images. Nothing to generate. CUDA SETUP: Loading binary D:\AAA\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll... Preparing dataset Preparing dataset Preparing Dataset (With Caching) 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 26/26 [00:07<00:00, 3.58it/s] Train Bucket 1: Resolution (512, 512), Count: 26 Total images: 13 Total dataset length (steps): 13 Sched breakpoint is 14300 Running training Instance Images: 26 Class Images: 0 Total Examples: 26 Num batches each epoch = 13 Num Epochs = 1100 Batch Size Per Device = 2 Gradient Accumulation steps = 2 Total train batch size (w. parallel, distributed & accumulation) = 4 Text Encoder Epochs: 0 Total optimization steps = 28600 Total training steps = 28600 Resuming from checkpoint: False First resume epoch: 0 First resume step: 0 Lora: False, Adam: True, Prec: fp16 Gradient Checkpointing: True EMA: False LR: 8e-06) Steps: 0%| | 0/28600 [00:00<?, ?it/s]Traceback (most recent call last): File "D:\AAA\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\dreambooth.py", line 561, in start_training result = main(config, use_txt2img=use_txt2img) File "D:\AAA\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 973, in main return inner_loop() File "D:\AAA\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\memory.py", line 116, in decorator return function(batch_size, grad_size, prof, *args, kwargs) File "D:\AAA\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\train_dreambooth.py", line 823, in inner_loop enc_out = text_encoder(batch["input_ids"], output_hidden_states=True, return_dict=True) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\accelerate\utils\operations.py", line 490, in call return convert_to_fp32(self.model_forward(args, kwargs)) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\amp\autocast_mode.py", line 12, in decorate_autocast return func(*args, kwargs) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 811, in forward return self.text_model( File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 721, in forward encoder_outputs = self.encoder( File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 650, in forward layer_outputs = encoder_layer( File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 379, in forward hidden_states, attn_weights = self.self_attn( File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 273, in forward query_states = self._shape(query_states, tgt_len, bsz).view(proj_shape) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\transformers\models\clip\modeling_clip.py", line 254, in _shape return tensor.view(bsz, seq_len, self.num_heads, self.head_dim).transpose(1, 2).contiguous() RuntimeError: CUDA error: an illegal instruction was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Steps: 0%| | 0/28600 [00:00<?, ?it/s] Training completed, reloading SD Model. Error completing request Arguments: ('CORAMODEL', True) {} Traceback (most recent call last): File "D:\AAA\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\utils.py", line 179, in f res = func(args, kwargs) File "D:\AAA\stable-diffusion-webui\extensions\sd_dreambooth_extension\scripts\dreambooth.py", line 584, in start_training reload_system_models() File "D:\AAA\stable-diffusion-webui\extensions\sd_dreambooth_extension\dreambooth\utils.py", line 171, in reload_system_models shared.sd_model.to(shared.device) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\pytorch_lightning\core\mixins\device_dtype_mixin.py", line 113, in to return super().to(*args, kwargs) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in to return self._apply(convert) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply module._apply(fn) [Previous line repeated 1 more time] File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply param_applied = fn(param) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 925, in convert return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking) RuntimeError: CUDA error: an illegal instruction was encountered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Traceback (most recent call last): File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict output = await app.get_blocks().process_api( File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1018, in process_api data = self.postprocess_data(fn_index, result["prediction"], state) File "D:\AAA\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 935, in postprocess_data if predictions[i] is components._Keywords.FINISHED_ITERATING: IndexError: list index out of range

Steps to reproduce the problem

1.) Start A1111 Webui 2.) latest Dreambooth 3) Try to train

What should have happened?

It should train, doesnt it? ;-)

Commit where the problem happens

889b851a5260ce869a3286ad15d17d1bbb1da0a7

What platforms do you use to access UI ?

Windows

What browsers do you use to access the UI ?

Google Chrome, Microsoft Edge

Command Line Arguments

--no-half --xformers --opt-split-attention --api

Additional information, context and logs

Everything up to date.

Using a RTX3060 (12 GB VRam)

1) no lora 2) 8 bit adam 3) fp16 4) caching

AUTOMATIC1111 / stable-diffusion-webui