XFormers can not perform memory_efficient_attention,
Command
To Reproduce
The code is from attention_processors line 266 (it can change but not too much) of the diffusers library:
# Make sure we can run the memory efficient attention
_ = xformers.ops.memory_efficient_attention(
torch.randn((1, 2, 40), device="cuda"),
torch.randn((1, 2, 40), device="cuda"),
torch.randn((1, 2, 40), device="cuda"),
)
which results in:
File "C:\Users\waxel\kohya\kohya_ss\sd-scripts\sdxl_train_network.py", line 184, in <module>
trainer.train(args)
File "C:\Users\waxel\kohya\kohya_ss\sd-scripts\train_network.py", line 243, in train
vae.set_use_memory_efficient_attention_xformers(args.xformers)
File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 273, in set_use_memory_efficient_attention_xformers
raise e
File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 268, in set_use_memory_efficient_attention_xformers
torch.randn((1, 2, 40), device="cuda"),
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Expected behavior
Well it should pass this test of course
Environment
CUDNN 8.9
Windows 11
RTX 3090
using :
xformers 0.0.23.post1+cu118
torch 2.1.2+cu118
🐛 Bug
XFormers can not perform memory_efficient_attention,
Command
To Reproduce
The code is from attention_processors line 266 (it can change but not too much) of the diffusers library:
which results in:
Expected behavior
Well it should pass this test of course
Environment
CUDNN 8.9 Windows 11 RTX 3090 using : xformers 0.0.23.post1+cu118 torch 2.1.2+cu118