Open brcisna opened 1 week ago
What exactly are you asking for help for?
The error message seems quite clear: you cannot pass float32 tensors to that operator on AMD GPUs.
If you're invoking xFormers through wunjo (no idea what that is) you should check with them to get them to fix their invocation.
🐛 Bug
Command
start wunjo V2
To Reproduce
Steps to reproduce the behavior:
briefcase dev # starts wunjo AI V2
1.go to generation tab 2.start image generation 3.in the console after a few seconds of image generation the following appears,,
ERROR No operator found for
memory_efficient_attention_forward
with inputs: query : shape=(1, 2, 1, 40) (torch.float32) key : shape=(1, 2, 1, 40) (torch.float32) value : shape=(1, 2, 1, 40) (torch.float32) attn_bias : <class 'NoneType'> p : 0.0ckF
is not supported because: dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})Expected behavior
image is created
Environment
Debian 13 python3.10.12 Pytorch4.2.1_rocm ROCm HIPCC AND Radeon Pro W6600 GPU
Please copy and paste the output from the environment collection script from PyTorch (or fill out the checklist below manually).
You can run the script with:
conda
,pip
, source): pipAdditional context
python -m xformers.info xFormers 0.0.28.post1 memory_efficient_attention.ckF: available memory_efficient_attention.ckB: available memory_efficient_attention.ck_decoderF: available memory_efficient_attention.ck_splitKF: available memory_efficient_attention.cutlassF: unavailable memory_efficient_attention.cutlassB: unavailable memory_efficient_attention.fa2F@0.0.0: unavailable memory_efficient_attention.fa2B@0.0.0: unavailable memory_efficient_attention.fa3F@0.0.0: unavailable memory_efficient_attention.fa3B@0.0.0: unavailable memory_efficient_attention.triton_splitKF: available indexing.scaled_index_addF: available indexing.scaled_index_addB: available indexing.index_select: available sequence_parallel_fused.write_values: available sequence_parallel_fused.wait_values: available sequence_parallel_fused.cuda_memset_32b_async: available sp24.sparse24_sparsify_both_ways: available sp24.sparse24_apply: available sp24.sparse24_apply_dense_output: available sp24._sparse24_gemm: available sp24._cslt_sparse_mm@0.0.0: available swiglu.dual_gemm_silu: available swiglu.gemm_fused_operand_sum: available swiglu.fused.p.cpp: available is_triton_available: True pytorch.version: 2.4.1+rocm6.1 pytorch.cuda: available gpu.compute_capability: 10.3 gpu.name: AMD Radeon Pro W6600 dcgm_profiler: unavailable build.info: available build.cuda_version: None build.hip_version: 6.1.40093-bd86f1708 build.python_version: 3.10.15 build.torch_version: 2.4.1+rocm6.1 build.env.TORCH_CUDA_ARCH_LIST:
build.env.PYTORCH_ROCM_ARCH: None build.env.XFORMERS_BUILD_TYPE: Release build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None build.env.NVCC_FLAGS: -allow-unsupported-compiler build.env.XFORMERS_PACKAGE_FROM: wheel-v0.0.28.post1 source.privacy: open source