Open radry opened 1 year ago
By any chance are you able to train with it on, if you set "Save an image to log directory every N steps, 0 to disable" and "Save a copy of embedding to log directory every N steps, 0 to disable", both to 1.
Trying to check something out crossed with #6705 I have Use cross attention optimizations while training off.
@f-rank Why would I set those options to 1? That would result in an image and emeding for every step. I don't want 3000 files.
@f-rank Why would I set those options to 1? That would result in an image and emeding for every step. I don't want 3000 files.
By setting it to 1, on my install it avoids the error thrown. I don't want 3000 files either. After setting it to 1 and it not throwing an error, I am able to change the number and then it trains normally.
I have just noticed an issue with this as well, when I have it enabled I get this error and I'm on a 3090 so I don't see how it says aren't enough resources, but if I turn it off I have no issues,
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\modules\textual_inversion\textual_inversion.py", line 493, in train_embedding
scaler.scale(loss).backward()
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\_tensor.py", line 488, in backward
torch.autograd.backward(
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 267, in apply
return user_fn(self, *args)
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\utils\checkpoint.py", line 157, in backward
torch.autograd.backward(outputs_with_grad, args_with_grad)
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\__init__.py", line 197, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 267, in apply
return user_fn(self, *args)
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\torch\autograd\function.py", line 414, in wrapper
outputs = fn(ctx, *args)
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 111, in backward
grads = _memory_efficient_attention_backward(
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\__init__.py", line 376, in _memory_efficient_attention_backward
op = _dispatch_bw(inp)
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 109, in _dispatch_bw
return _run_priority_list(
File "E:\Artificial-Inteligence\Ai-Image-Gen\Automatic-1111-GUI\stable-diffusion-webui\venv\lib\site-packages\xformers\ops\fmha\dispatch.py", line 70, in _run_priority_list
raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_backward` with inputs:
query : shape=(5, 1024, 8, 80) (torch.float32)
key : shape=(5, 77, 8, 80) (torch.float32)
value : shape=(5, 77, 8, 80) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
`flshattB` is not supported because:
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
requires a GPU with compute capability == 8.0 for 'query.shape[-1] > 64'
`cutlassB` is not supported because:
Sm86 does not have enough shared-memory to run this kernel - see https://github.com/facebookresearch/xformers/issues/517
`smallkB` is not supported because:
max(query.shape[-1] != value.shape[-1]) > 32
unsupported embed per head: 80 ```
bump
Seems like I stumbled across the same (or very similar) issue on my RTX A4000 16GB card (and 32GB RAM).
No matter the settings (toggling "Use cross attention optimizations while training", "Move VAE and CLIP to RAM when training if possible. Saves VRAM" settings on/off; adding/removing command line args --no-half --xformers --medvram
in any combination; setting "Batch size" to lower values; setting "Save an image to log directory every N steps" and "Save a copy of embedding to log directory every N steps" to 1) result is the same: either "CUDA out of memory" or "NotImplementedError" error immediately after "Preparing dataset" is completed with "Training finished at 0 steps" report showing in webui.
Some threads here suggest downgrading pytorch, but I haven't try that yet (currently running torch 1.13.1+cu117 and xformers 0.0.16rc425 ).
Is there an existing issue for this?
What happened?
When having the option "Use cross attention optimizations while training" enabled, the training fails at 0 steps. When disabling the Setting, the training starts normally. See log belog
Steps to reproduce the problem
What should have happened?
Training should start.
Commit where the problem happens
50fb20ce
What platforms do you use to access UI ?
Other/Cloud
What browsers do you use to access the UI ?
Mozilla Firefox
Command Line Arguments
No response
Additional information, context and logs