[Bug]: Textual-Inversion training fails with "Use cross attention optimizations while training" active

radry commented 1 year ago

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits

What happened?

When having the option "Use cross attention optimizations while training" enabled, the training fails at 0 steps. When disabling the Setting, the training starts normally. See log belog

Steps to reproduce the problem

Enable "Use cross attention optimizations while training" in Train settings
Train a new embedding, setting don't matter.

What should have happened?

Training should start.

Commit where the problem happens

50fb20ce

What platforms do you use to access UI ?

Other/Cloud

What browsers do you use to access the UI ?

Mozilla Firefox

Command Line Arguments

No response

Additional information, context and logs

2023-01-14T20:16:07.779856818Z Training at rate of 0.002 until step 3000
2023-01-14T20:16:07.779875489Z Preparing dataset...
2023-01-14T20:16:09.521869849Z 
  0%|          | 0/65 [00:00<?, ?it/s]
 35%|███▌      | 23/65 [00:00<00:00, 170.78it/s]
 63%|██████▎   | 41/65 [00:00<00:00, 35.63it/s] 
 78%|███████▊  | 51/65 [00:01<00:00, 43.97it/s]
 94%|█████████▍| 61/65 [00:01<00:00, 34.92it/s]
100%|██████████| 65/65 [00:01<00:00, 37.32it/s]
2023-01-14T20:16:09.754453689Z 
  0%|          | 0/3000 [00:00<?, ?it/s]Traceback (most recent call last):
2023-01-14T20:16:09.754471322Z   File "/workspace/stable-diffusion-webui/modules/textual_inversion/textual_inversion.py", line 454, in train_embedding
2023-01-14T20:16:09.754474475Z     scaler.scale(loss).backward()
2023-01-14T20:16:09.754476800Z   File "/workspace/venv/lib/python3.10/site-packages/torch/_tensor.py", line 396, in backward
2023-01-14T20:16:09.754479170Z     torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
2023-01-14T20:16:09.754481433Z   File "/workspace/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 173, in backward
2023-01-14T20:16:09.754483686Z     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2023-01-14T20:16:09.754486033Z   File "/workspace/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 253, in apply
2023-01-14T20:16:09.754488316Z     return user_fn(self, *args)
2023-01-14T20:16:09.754490375Z   File "/workspace/venv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 146, in backward
2023-01-14T20:16:09.754492598Z     torch.autograd.backward(outputs_with_grad, args_with_grad)
2023-01-14T20:16:09.754494724Z   File "/workspace/venv/lib/python3.10/site-packages/torch/autograd/__init__.py", line 173, in backward
2023-01-14T20:16:09.754496946Z     Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
2023-01-14T20:16:09.754499105Z   File "/workspace/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 253, in apply
2023-01-14T20:16:09.754501352Z     return user_fn(self, *args)
2023-01-14T20:16:09.754503550Z   File "/workspace/venv/lib/python3.10/site-packages/torch/autograd/function.py", line 399, in wrapper
2023-01-14T20:16:09.754505815Z     outputs = fn(ctx, *args)
2023-01-14T20:16:09.754507889Z   File "/workspace/venv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 111, in backward

f-rank commented 1 year ago

By any chance are you able to train with it on, if you set "Save an image to log directory every N steps, 0 to disable" and "Save a copy of embedding to log directory every N steps, 0 to disable", both to 1.

Trying to check something out crossed with #6705 I have Use cross attention optimizations while training off.

radry commented 1 year ago

@f-rank Why would I set those options to 1? That would result in an image and emeding for every step. I don't want 3000 files.

f-rank commented 1 year ago

@f-rank Why would I set those options to 1? That would result in an image and emeding for every step. I don't want 3000 files.

By setting it to 1, on my install it avoids the error thrown. I don't want 3000 files either. After setting it to 1 and it not throwing an error, I am able to change the number and then it trains normally.

TWIISTED-STUDIOS commented 1 year ago

I have just noticed an issue with this as well, when I have it enabled I get this error and I'm on a 3090 so I don't see how it says aren't enough resources, but if I turn it off I have no issues,


  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 493, in train_embedding
    scaler.scale(loss).backward()
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 488, in backward
    torch.autograd.backward(
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 267, in apply
    return user_fn(self, *args)
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 157, in backward
    torch.autograd.backward(outputs_with_grad, args_with_grad)
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\__init__.py", line 197, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 267, in apply
    return user_fn(self, *args)
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 414, in wrapper
    outputs = fn(ctx, *args)
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 111, in backward
    grads = _memory_efficient_attention_backward(
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 376, in _memory_efficient_attention_backward
    op = _dispatch_bw(inp)
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 109, in _dispatch_bw
    return _run_priority_list(
  File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 70, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_backward` with inputs:
     query       : shape=(5, 1024, 8, 80) (torch.float32)
     key         : shape=(5, 77, 8, 80) (torch.float32)
     value       : shape=(5, 77, 8, 80) (torch.float32)
     attn_bias   : <class 'NoneType'>
     p           : 0.0
`flshattB` is not supported because:
    dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
    requires a GPU with compute capability == 8.0 for 'query.shape[-1] > 64'
`cutlassB` is not supported because:
    Sm86 does not have enough shared-memory to run this kernel - see https://github.com/facebookresearch/xformers/issues/517
`smallkB` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    unsupported embed per head: 80 ```

gigadeplex commented 1 year ago

bump

Spirik commented 1 year ago

Seems like I stumbled across the same (or very similar) issue on my RTX A4000 16GB card (and 32GB RAM).

No matter the settings (toggling "Use cross attention optimizations while training", "Move VAE and CLIP to RAM when training if possible. Saves VRAM" settings on/off; adding/removing command line args --no-half --xformers --medvram in any combination; setting "Batch size" to lower values; setting "Save an image to log directory every N steps" and "Save a copy of embedding to log directory every N steps" to 1) result is the same: either "CUDA out of memory" or "NotImplementedError" error immediately after "Preparing dataset" is completed with "Training finished at 0 steps" report showing in webui.

Some threads here suggest downgrading pytorch, but I haven't try that yet (currently running torch 1.13.1+cu117 and xformers 0.0.16rc425 ).

Spirik commented 1 year ago

I've installed another version of xformers, following this suggestion (0.0.17.dev461 in my case), and it seems to do the trick (I also had to remove --medvram argument). Training now started and currently is in progress.

AUTOMATIC1111 / stable-diffusion-webui