🐛 Bug

XFormers can not perform memory_efficient_attention,

Command

To Reproduce

The code is from attention_processors line 266 (it can change but not too much) of the diffusers library:

# Make sure we can run the memory efficient attention
                    _ = xformers.ops.memory_efficient_attention(
                        torch.randn((1, 2, 40), device="cuda"),
                        torch.randn((1, 2, 40), device="cuda"),
                        torch.randn((1, 2, 40), device="cuda"),
                    )

which results in:

  File "C:\Users\waxel\kohya\kohya_ss\sd-scripts\sdxl_train_network.py", line 184, in <module>
    trainer.train(args)
  File "C:\Users\waxel\kohya\kohya_ss\sd-scripts\train_network.py", line 243, in train
    vae.set_use_memory_efficient_attention_xformers(args.xformers)
  File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 262, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 258, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 273, in set_use_memory_efficient_attention_xformers
    raise e
  File "C:\Users\waxel\kohya\kohya_ss\venv\lib\site-packages\diffusers\models\attention_processor.py", line 268, in set_use_memory_efficient_attention_xformers
    torch.randn((1, 2, 40), device="cuda"),
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Expected behavior

Well it should pass this test of course

Environment

CUDNN 8.9 Windows 11 RTX 3090 using : xformers 0.0.23.post1+cu118 torch 2.1.2+cu118

facebookresearch / xformers

Xformers can not perform memory_efficient_attention #1013

🐛 Bug

Command

To Reproduce

Expected behavior

Environment