facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.66k stars 614 forks source link

Getting an error after enabling xformers #692

Open yashkant opened 1 year ago

yashkant commented 1 year ago

❓ Questions and Help

Hi! Thanks for the cool library, I get the following error when I enable xformers in my code:

FATAL: kernel `fmha_cutlassF_f32_aligned_32x128_gmem_sm80` is for sm80-sm90, but was built for sm70

I am wondering if you would have any suggestions on what goes wrong? I digged a bit deeper and found that this could be related to file in apex [link].

danthe3rd commented 1 year ago

Hi @yashkant I don't think this is related to apex. What GPU type do you have on your machine? Can you report the output of "python -m xformers.info" ?

yashkant commented 1 year ago

hi! thanks for replying, I am using 8xA100 gpus of 40GB each.

danthe3rd commented 1 year ago

Can you report the output of "python -m xformers.info" ?

Can you share this?

JokerGT commented 1 year ago

i have the same error on lambdacloud using a H100 PCIe

FATAL: kernel fmha_cutlassF_f32_aligned_32x128_gmem_sm80 is for sm80-sm100, but was built for sm50

python -m xformers.info
xFormers 0.0.21.dev577
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
memory_efficient_attention.flshattFv2:             available
memory_efficient_attention.flshattBv2:             available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        unavailable
memory_efficient_attention.tritonflashattB:        unavailable
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
is_functorch_available:                            False
pytorch.version:                                   2.0.1+cu118
pytorch.cuda:                                      available
gpu.compute_capability:                            9.0
gpu.name:                                          NVIDIA H100 PCIe
build.info:                                        available
build.cuda_version:                                1108
build.python_version:                              3.10.12
build.torch_version:                               2.0.1+cu118
build.env.TORCH_CUDA_ARCH_LIST:                    5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6
build.env.XFORMERS_BUILD_TYPE:                     Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   wheel-main
build.nvcc_version:                                11.8.89
source.privacy:                                    open source
alex000kim commented 1 year ago

Hi, I am seeing

FATAL: kernel `fmha_cutlassF_bf16_aligned_64x128_rf_sm80` is for sm80-sm100, but was built for sm50

on p5.48xlarge instance (with H100) in AWS with AMI ami-094b9c818a4449717

$ python -m xformers.info
xFormers 0.0.20
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.flshattF:               available
memory_efficient_attention.flshattB:               available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.tritonflashattF:        available
memory_efficient_attention.tritonflashattB:        available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
is_functorch_available:                            False
pytorch.version:                                   2.0.1+cu118
pytorch.cuda:                                      available
gpu.compute_capability:                            9.0
gpu.name:                                          NVIDIA H100 80GB HBM3
build.info:                                        available
build.cuda_version:                                1108
build.python_version:                              3.10.11
build.torch_version:                               2.0.1+cu118
build.env.TORCH_CUDA_ARCH_LIST:                    5.0+PTX 6.0 6.1 7.0 7.5 8.0 8.6
build.env.XFORMERS_BUILD_TYPE:                     Release
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   wheel-v0.0.20
build.nvcc_version:                                11.8.89
source.privacy:                                    open source
danthe3rd commented 1 year ago

Hi, Thanks for this report! This is because you are running on H100, but xformers wheels don't contain the kernels for 9.0 (H100) as you can see in build.env.TORCH_CUDA_ARCH_LIST. We will enable H100 for the next release, but for the time being you need to build xformers from source

alex000kim commented 1 year ago

Thanks, I can confirm that building from source works!