ROCm / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
18 stars 7 forks source link

Some features (cutlassF, smallkF, ...) appear to be unavailable when executing 'python -m xformers.info' #21

Open Zars19 opened 3 weeks ago

Zars19 commented 3 weeks ago

❓ Questions and Help

Some features appear to be unavailable when executing 'python -m xformers.info' (cutlassF, smallkF, ...) Is this normal?

xFormers 0.0.27+7a04357.d20240822
memory_efficient_attention.ckF:                    available
memory_efficient_attention.ckB:                    available
memory_efficient_attention.ck_decoderF:            available
memory_efficient_attention.ck_splitKF:             available
memory_efficient_attention.cutlassF:               unavailable
memory_efficient_attention.cutlassB:               unavailable
memory_efficient_attention.decoderF:               unavailable
memory_efficient_attention.flshattF@2.5.6-pt:      available
memory_efficient_attention.flshattB@2.5.6-pt:      available
memory_efficient_attention.smallkF:                unavailable
memory_efficient_attention.smallkB:                unavailable
memory_efficient_attention.triton_splitKF:         available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
sequence_parallel_fused.write_values:              available
sequence_parallel_fused.wait_values:               available
sequence_parallel_fused.cuda_memset_32b_async:     available
sp24.sparse24_sparsify_both_ways:                  available
sp24.sparse24_apply:                               available
sp24.sparse24_apply_dense_output:                  available
sp24._sparse24_gemm:                               available
sp24._cslt_sparse_mm@0.0.0:                        available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
pytorch.version:                                   2.4.0+rocm6.1
pytorch.cuda:                                      available
gpu.compute_capability:                            9.4
gpu.name:                                          AMD Radeon Graphics
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                None
build.hip_version:                                 6.2.41133-dd7f95766
build.python_version:                              3.9.19
build.torch_version:                               2.4.0+rocm6.1
build.env.TORCH_CUDA_ARCH_LIST:                    None
build.env.PYTORCH_ROCM_ARCH:                       gfx900;gfx906;gfx908;gfx90a;gfx1030;gfx1100;gfx1101;gfx940;gfx941;gfx942
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source
Zars19 commented 3 weeks ago

And the result of test_mem_eff_attention.py is: 2489 failed, 3483 passed, 9033 skipped, 36 warnings in 2539.42s (0:42:19)

tenpercent commented 6 days ago

Hi @Zars19

CUTLASS-related extensions are only compiled for CUDA (not ROCm) builds.

The currently failing tests are related to the pytorch-internal Flash Attention implementation, and this op should be disabled on ROCm due to lack of support of tested features