Open KohakuBlueleaf opened 1 year ago
af6b866f1b1340f2b4681d1ad1c5fe96957307a9 commit has same problem
xFormers 0.0.22+af6b866.d20230926
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@0.0.0: unavailable
memory_efficient_attention.flshattB@0.0.0: unavailable
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.tritonflashattF: unavailable
memory_efficient_attention.tritonflashattB: unavailable
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: False
pytorch.version: 2.1.0.dev20230821+cu121
pytorch.cuda: available
gpu.compute_capability: 8.9
gpu.name: NVIDIA GeForce RTX 4090
build.info: available
build.cuda_version: 1201
build.python_version: 3.11.5
build.torch_version: 2.1.0.dev20230821+cu121
build.env.TORCH_CUDA_ARCH_LIST: 8.9
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: "-allow-unsupported-compiler"
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 12.1.66
source.privacy: open source
Hi, Flash-Attention does not support windows at the moment, so we don't build it on windows (see for instance https://github.com/Dao-AILab/flash-attention/issues/565). We still can run our own implementation which should be a bit faster than Flash v1 (but slower than Flash v2). Once Flash-Attention v2 has support for windows, we will add it back.
Hi, Flash-Attention does not support windows at the moment, so we don't build it on windows (see for instance Dao-AILab/flash-attention#565). We still can run our own implementation which should be a bit faster than Flash v1 (but slower than Flash v2). Once Flash-Attention v2 has support for windows, we will add it back.
It seems like flash-attention 2.3.2 supports windows now. https://github.com/Dao-AILab/flash-attention/issues/595#issuecomment-1752281403
Hi, Flash-Attention does not support windows at the moment, so we don't build it on windows (see for instance Dao-AILab/flash-attention#565). We still can run our own implementation which should be a bit faster than Flash v1 (but slower than Flash v2). Once Flash-Attention v2 has support for windows, we will add it back.
It seems like flash-attention 2.3.2 supports windows now. Dao-AILab/flash-attention#595 (comment)
I will try to build flash attn with torch2.1.0 and cuda12.1 to see if it worked
Hi, Flash-Attention does not support windows at the moment, so we don't build it on windows (see for instance Dao-AILab/flash-attention#565). We still can run our own implementation which should be a bit faster than Flash v1 (but slower than Flash v2). Once Flash-Attention v2 has support for windows, we will add it back.
It seems like flash-attention 2.3.2 supports windows now. Dao-AILab/flash-attention#595 (comment)
I will try to build flash attn with torch2.1.0 and cuda12.1 to see if it worked
Does xformers automatically uses if FA2 is installed in the venv, or you have to build it with FA2 installed instead?
@danthe3rd Flash attention is able to be compiled/installed on windows after 2.3.2 Will xformers update for it?
š Bug
Command
python -m xformers.info
To Reproduce
Steps to reproduce the behavior:
install xformers 0.0.21 or build from source on latest commit on windows, memory_efficient_attention.flshattF/B are all unavailable. (Also, the build.env.TORCH_CUDA_ARCH_LIST in pre-built wheel doesn't have 8.6 and 8.9)
Expected behavior
both pre-built wheel and build from source should give us flash attention support. (If this situation is bcuz windows doesn't support some feature which is needed in flashattn2, plz at least give us flash attn1 support on windows)
I also wondered if this is some bug in xformers.info, but since xformers 0.0.21 actually give me slower result than 0.0.20, I think flash attn just gone.
Environment
Additional context
here is the output of xformers.info on 0.0.21:
Here is the output of 0.0.20: