facebookresearch / xformers

Hackable and optimized Transformers building blocks, supporting a composable construction.
https://facebookresearch.github.io/xformers/
Other
8.41k stars 597 forks source link

Compiling from source fails to build extensions/is not usable for torch nightly 2.4.0 + cuda 12.4 #1057

Open CoffeeVampir3 opened 3 months ago

CoffeeVampir3 commented 3 months ago

🐛 Bug

To Reproduce

Install torch (direct from https://pytorch.org/get-started/locally/ ) pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 ^ Follow the steps from (for source compilation) https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers

Run python -m xformers.info

(base) ➜  Desktop python -m xformers.info                                                                              
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.4.0.dev20240602+cu124 with CUDA 1205 (you have 2.4.0.dev20240602+cu124)
    Python  3.10.14 (you have 3.10.14)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
Traceback (most recent call last):
  File "/home/blackroot/miniforge3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/blackroot/miniforge3/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/info.py", line 11, in <module>
    from . import __version__, _cpp_lib, _is_opensource, _is_triton_available, ops
  File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/__init__.py", line 8, in <module>
    from .fmha import (
  File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 12, in <module>
    from . import ck, ck_decoder, ck_splitk, cutlass, decoder, flash, small_k, triton_splitk
  File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/fmha/cutlass.py", line 160, in <module>
    USE_TORCH_CUTLASS = not torch._C._dispatch_has_kernel_for_dispatch_key(
RuntimeError: operator xformers::efficient_attention_forward_cutlass does not exist

Expected behavior

The library compiles extensions.

Environment

Note: A seperate user on windows has confirmed that the newest xformers also fails, however he noted: it's ded since may 29th version, torch 2.4.0 nightly of 28th may works

PyTorch version: 2.4.0.dev20240602+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A

OS: EndeavourOS Linux (x86_64)
GCC version: (GCC) 14.1.1 20240522
Clang version: 17.0.6
CMake version: version 3.29.4
Libc version: glibc-2.39

Python version: 3.10.14 | packaged by conda-forge | (main, Mar 20 2024, 12:45:18) [GCC 12.3.0] (64-bit runtime)
Python platform: Linux-6.9.3-arch1-1-x86_64-with-glibc2.39
Is CUDA available: True
CUDA runtime version: 12.5.40
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: 
GPU 0: NVIDIA GeForce RTX 3090 Ti
GPU 1: NVIDIA RTX A4000

Nvidia driver version: 555.52.04
cuDNN version: Probably one of the following:
/usr/lib/libcudnn.so.9.1.1
/usr/lib/libcudnn_adv.so.9.1.1
/usr/lib/libcudnn_cnn.so.9.1.1
/usr/lib/libcudnn_engines_precompiled.so.9.1.1
/usr/lib/libcudnn_engines_runtime_compiled.so.9.1.1
/usr/lib/libcudnn_graph.so.9.1.1
/usr/lib/libcudnn_heuristic.so.9.1.1
/usr/lib/libcudnn_ops.so.9.1.1
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture:                         x86_64
CPU op-mode(s):                       32-bit, 64-bit
Address sizes:                        39 bits physical, 48 bits virtual
Byte Order:                           Little Endian
CPU(s):                               20
On-line CPU(s) list:                  0-19
Vendor ID:                            GenuineIntel
Model name:                           12th Gen Intel(R) Core(TM) i7-12700KF
CPU family:                           6
Model:                                151
Thread(s) per core:                   2
Core(s) per socket:                   12
Socket(s):                            1
Stepping:                             2
CPU(s) scaling MHz:                   19%
CPU max MHz:                          5000.0000
CPU min MHz:                          800.0000
BogoMIPS:                             7222.00
Flags:                                fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr ibt flush_l1d arch_capabilities
Virtualization:                       VT-x
L1d cache:                            512 KiB (12 instances)
L1i cache:                            512 KiB (12 instances)
L2 cache:                             12 MiB (9 instances)
L3 cache:                             25 MiB (1 instance)
NUMA node(s):                         1
NUMA node0 CPU(s):                    0-19
Vulnerability Gather data sampling:   Not affected
Vulnerability Itlb multihit:          Not affected
Vulnerability L1tf:                   Not affected
Vulnerability Mds:                    Not affected
Vulnerability Meltdown:               Not affected
Vulnerability Mmio stale data:        Not affected
Vulnerability Reg file data sampling: Mitigation; Clear Register File
Vulnerability Retbleed:               Not affected
Vulnerability Spec rstack overflow:   Not affected
Vulnerability Spec store bypass:      Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:             Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:             Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S
Vulnerability Srbds:                  Not affected
Vulnerability Tsx async abort:        Not affected

Versions of relevant libraries:
[pip3] alias-free-torch==0.0.6
[pip3] clip-anytorch==2.6.0
[pip3] dctorch==0.1.2
[pip3] ema-pytorch==0.2.3
[pip3] lovely-numpy==0.2.12
[pip3] numpy==1.23.5
[pip3] onnx==1.16.1
[pip3] onnxruntime==1.18.0
[pip3] open-clip-torch==2.22.0
[pip3] pytorch-lightning==2.1.0
[pip3] pytorch-triton==3.0.0+45fff310c8
[pip3] qtorch==0.3.0
[pip3] torch==2.4.0.dev20240602+cu124
[pip3] torch-stoi==0.2.1
[pip3] torchao==0.2.0
[pip3] torchaudio==2.2.0.dev20240602+cu124
[pip3] torchdiffeq==0.2.4
[pip3] torchlibrosa==0.1.0
[pip3] torchmetrics==0.11.4
[pip3] torchsde==0.2.6
[pip3] torchvision==0.19.0.dev20240602+cu124
[pip3] v-diffusion-pytorch==0.0.2
[pip3] vector-quantize-pytorch==1.9.14
[conda] alias-free-torch          0.0.6                    pypi_0    pypi
[conda] clip-anytorch             2.6.0                    pypi_0    pypi
[conda] dctorch                   0.1.2                    pypi_0    pypi
[conda] ema-pytorch               0.2.3                    pypi_0    pypi
[conda] lovely-numpy              0.2.12                   pypi_0    pypi
[conda] numpy                     1.23.5                   pypi_0    pypi
[conda] open-clip-torch           2.22.0                   pypi_0    pypi
[conda] pytorch-lightning         2.1.0                    pypi_0    pypi
[conda] pytorch-triton            3.0.0+45fff310c8          pypi_0    pypi
[conda] qtorch                    0.3.0                    pypi_0    pypi
[conda] torch                     2.4.0.dev20240602+cu124          pypi_0    pypi
[conda] torch-stoi                0.2.1                    pypi_0    pypi
[conda] torchao                   0.2.0                    pypi_0    pypi
[conda] torchaudio                2.2.0.dev20240602+cu124          pypi_0    pypi
[conda] torchdiffeq               0.2.4                    pypi_0    pypi
[conda] torchlibrosa              0.1.0                    pypi_0    pypi
[conda] torchmetrics              0.11.4                   pypi_0    pypi
[conda] torchsde                  0.2.6                    pypi_0    pypi
[conda] torchvision               0.19.0.dev20240602+cu124          pypi_0    pypi
[conda] v-diffusion-pytorch       0.0.2                    pypi_0    pypi
[conda] vector-quantize-pytorch   1.9.14                   pypi_0    pypi
Panchovix commented 3 months ago

Can confirm the issue on Windows, on both Python 3.11 and 3.12 since May 29th torch nightly build.

aswordok commented 3 months ago

win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /e /i C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /e /i C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.lowSpeedLimit 0 git config --global http.lowSpeedTime 3600 7. git config --global http.postBuffer 2G git config http.postBuffer 8. git config --system core.longpaths true 9. python -m pip install --upgrade pip Successfully installed pip-24.0 10. python -m pip install ninja 11. python -m pip install wheel setuptools 12. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609

CoffeeVampir3 commented 3 months ago

win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "C:\Program Files\ffmpeg\bin;%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.postBuffer 2G 7. git config --system core.longpaths true 8. python -m pip install --upgrade pip Successfully installed pip-24.0 9. python -m pip install ninja 10. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609

Could you share your output of: python -m xformers.info

Tried replicating as closely as possible, but I personally have the same error.

Panchovix commented 3 months ago

win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.postBuffer 2G 7. git config --system core.longpaths true 8. python -m pip install --upgrade pip Successfully installed pip-24.0 9. python -m pip install ninja 10. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609

This seems to be able to build xformers but without flash attention, so you can't use it on A1111 webui for example.

aswordok commented 3 months ago

win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "C:\Program Files\ffmpeg\bin;%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.postBuffer 2G 7. git config --system core.longpaths true 8. python -m pip install --upgrade pip Successfully installed pip-24.0 9. python -m pip install ninja 10. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609

Could you share your output of: python -m xformers.info

Tried replicating as closely as possible, but I personally have the same error.

my PC: D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\xformers>..\python -m xformers.info D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\fmha\flash.py:231: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_fwd") D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\fmha\flash.py:358: FutureWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch. @torch.library.impl_abstract("xformers_flash::flash_bwd") D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\swiglu_op.py:127: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. @torch.cuda.amp.custom_fwd D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\swiglu_op.py:148: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. @torch.cuda.amp.custom_bwd Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured. xFormers 0.0.27+66cfba7.d20240609 memory_efficient_attention.ckF: unavailable memory_efficient_attention.ckB: unavailable memory_efficient_attention.ck_decoderF: unavailable memory_efficient_attention.ck_splitKF: unavailable memory_efficient_attention.cutlassF-pt: available memory_efficient_attention.cutlassB-pt: available memory_efficient_attention.decoderF: available memory_efficient_attention.flshattF@2.5.6-pt: available memory_efficient_attention.flshattB@2.5.6-pt: available memory_efficient_attention.smallkF: available memory_efficient_attention.smallkB: available memory_efficient_attention.triton_splitKF: unavailable indexing.scaled_index_addF: unavailable indexing.scaled_index_addB: unavailable indexing.index_select: unavailable sequence_parallel_fused.write_values: available sequence_parallel_fused.wait_values: available sequence_parallel_fused.cuda_memset_32b_async: available sp24.sparse24_sparsify_both_ways: available sp24.sparse24_apply: available sp24.sparse24_apply_dense_output: available sp24._sparse24_gemm: available sp24._cslt_sparse_mm@0.0.0: available swiglu.dual_gemm_silu: available swiglu.gemm_fused_operand_sum: available swiglu.fused.p.cpp: available is_triton_available: False pytorch.version: 2.4.0.dev20240606+cu124 pytorch.cuda: available gpu.compute_capability: 6.1 gpu.name: Quadro P4000 dcgm_profiler: unavailable build.info: available build.cuda_version: 1204 build.hip_version: None build.python_version: 3.11.8 build.torch_version: 2.4.0.dev20240606+cu124 build.env.TORCH_CUDA_ARCH_LIST: None build.env.PYTORCH_ROCM_ARCH: None build.env.XFORMERS_BUILD_TYPE: None build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None build.env.NVCC_FLAGS: None build.env.XFORMERS_PACKAGE_FROM: None build.nvcc_version: 12.4.131 source.privacy: open source

my laptop: (No GPU) D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded>python -m xformers.info D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\swiglu_op.py:127: FutureWarning: torch.cuda.amp.custom_fwd(args...) is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda') instead. @torch.cuda.amp.custom_fwd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\swiglu_op.py:148: FutureWarning: torch.cuda.amp.custom_bwd(args...) is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda') instead. @torch.cuda.amp.custom_bwd Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured. xFormers 0.0.27+66cfba7.d20240609 memory_efficient_attention.ckF: unavailable memory_efficient_attention.ckB: unavailable memory_efficient_attention.ck_decoderF: unavailable memory_efficient_attention.ck_splitKF: unavailable memory_efficient_attention.cutlassF-pt: available memory_efficient_attention.cutlassB-pt: available memory_efficient_attention.decoderF: available memory_efficient_attention.flshattF@0.0.0: unavailable memory_efficient_attention.flshattB@0.0.0: unavailable memory_efficient_attention.smallkF: available memory_efficient_attention.smallkB: available memory_efficient_attention.triton_splitKF: unavailable indexing.scaled_index_addF: unavailable indexing.scaled_index_addB: unavailable indexing.index_select: unavailable sequence_parallel_fused.write_values: available sequence_parallel_fused.wait_values: available sequence_parallel_fused.cuda_memset_32b_async: available sp24.sparse24_sparsify_both_ways: available sp24.sparse24_apply: available sp24.sparse24_apply_dense_output: available sp24._sparse24_gemm: available sp24._cslt_sparse_mm@0.0.0: available swiglu.dual_gemm_silu: available swiglu.gemm_fused_operand_sum: available swiglu.fused.p.cpp: available is_triton_available: False pytorch.version: 2.4.0.dev20240606+cu124 pytorch.cuda: not available dcgm_profiler: unavailable build.info: available build.cuda_version: None build.hip_version: None build.python_version: 3.11.8 build.torch_version: 2.4.0.dev20240606+cu124 build.env.TORCH_CUDA_ARCH_LIST: None build.env.PYTORCH_ROCM_ARCH: None build.env.XFORMERS_BUILD_TYPE: None build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None build.env.NVCC_FLAGS: None build.env.XFORMERS_PACKAGE_FROM: None source.privacy: open source

aswordok commented 3 months ago

win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.postBuffer 2G 7. git config --system core.longpaths true 8. python -m pip install --upgrade pip Successfully installed pip-24.0 9. python -m pip install ninja 10. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609

This seems to be able to build xformers but without flash attention, so you can't use it on A1111 webui for example.

If the network is not good, you can step by step: (Carefully read the output information, where you need git clone, manually help clone well in advance.) cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded To set the timeout period in seconds: git config http.lowSpeedLimit 0 git config http.lowSpeedTime 3600 git clone https://github.com/facebookresearch/xformers.git cd xformers\third_party git clone https://github.com/Dao-AILab/flash-attention.git/ cd flash-attention/csrc git clone https://github.com/NVIDIA/cutlass.git cd .. & cd .. & cd .. git submodule update --init --recursive ..\python -m pip install ninja ..\python -m pip install wheel setuptools ..\python -m pip install .

Processing d:+ai\comfyui\comfyui_windows_portable\python_embeded\xformers Preparing metadata (setup.py) ... done Requirement already satisfied: torch>=2.2 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from xformers==0.0.27+66cfba7.d20240609) (2.4.0.dev20240606+cu124) Requirement already satisfied: numpy in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from xformers==0.0.27+66cfba7.d20240609) (1.26.4) Requirement already satisfied: filelock in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (3.13.1) Requirement already satisfied: typing-extensions>=4.8.0 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (4.10.0) Requirement already satisfied: sympy in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (1.12) Requirement already satisfied: networkx in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (3.2.1) Requirement already satisfied: jinja2 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (3.1.3) Requirement already satisfied: fsspec in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2024.2.0) Requirement already satisfied: mkl<=2021.4.0,>=2021.1.1 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2021.4.0) Requirement already satisfied: intel-openmp==2021. in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from mkl<=2021.4.0,>=2021.1.1->torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2021.4.0) Requirement already satisfied: tbb==2021. in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from mkl<=2021.4.0,>=2021.1.1->torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2021.12.0) Requirement already satisfied: MarkupSafe>=2.0 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from jinja2->torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2.1.5) Requirement already satisfied: mpmath>=0.19 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from sympy->torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (1.3.0) Building wheels for collected packages: xformers Building wheel for xformers (setup.py) ... done Created wheel for xformers: filename=xformers-0.0.27+66cfba7.d20240609-cp311-cp311-win_amd64.whl size=8851842 sha256=01ef61832306328f345d04c781d7ba512d4e8b76e0f857d1042afa9c1233f949 Stored in directory: C:\Users\user\AppData\Local\Temp\pip-ephem-wheel-cache-8hqkppk7\wheels\2e\8a\b3\4abaaea64fba5a483ecb34cbc60212a1f0bf7e817ddc6e9894 Successfully built xformers Installing collected packages: xformers Successfully installed xformers-0.0.27+66cfba7.d20240609

danthe3rd commented 3 months ago

Hi, Thanks for the report, we will have a look. In the meantime, you can use this commit and everything should work: https://github.com/facebookresearch/xformers/commit/a40ca6e4a9aeb2093d7a03c5ae2a9f1215f3c296

cc @lvaleriu

danthe3rd commented 3 months ago

It also looks like xformers can't load the extensions because you have the following message:

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.4.0.dev20240602+cu124 with CUDA 1205 (you have 2.4.0.dev20240602+cu124)
    Python  3.10.14 (you have 3.10.14)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details

Can you try running this to get more information?

XFORMERS_MORE_DETAILS=1 python -m xformers.info
Panchovix commented 3 months ago

I'm not OP but when I execute that on Windows (using $env:XFORMERS_MORE_DETAILS=1)

I get the same issue

(venv) PS G:\f\xformers_cu124_nightly_py312_08-06-24> $env:XFORMERS_MORE_DETAILS=1
(venv) PS G:\f\xformers_cu124_nightly_py312_08-06-24> python -m xformers.info
WARNING[XFORMERS]: Need to compile C++ extensions to use all xFormers features.
    Please install xformers properly (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
  File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\_cpp_lib.py", line 138, in <module>
    _build_metadata = _register_extensions()
                      ^^^^^^^^^^^^^^^^^^^^^^
  File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\_cpp_lib.py", line 123, in _register_extensions
    raise xFormersWasNotBuiltException()
xformers._cpp_lib.xFormersWasNotBuiltException: Need to compile C++ extensions to use all xFormers features.
    Please install xformers properly (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\info.py", line 11, in <module>
    from . import __version__, _cpp_lib, _is_opensource, _is_triton_available, ops
  File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\ops\__init__.py", line 8, in <module>
    from .fmha import (
  File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\ops\fmha\__init__.py", line 12, in <module>
    from . import ck, ck_decoder, ck_splitk, cutlass, decoder, flash, small_k, triton_splitk
  File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\ops\fmha\cutlass.py", line 160, in <module>
    USE_TORCH_CUTLASS = not torch._C._dispatch_has_kernel_for_dispatch_key(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator xformers::efficient_attention_forward_cutlass does not exist

pip list

Package           Version
----------------- ------------------------
einops            0.8.0
filelock          3.13.1
fsspec            2024.2.0
intel-openmp      2021.4.0
Jinja2            3.1.3
MarkupSafe        2.1.5
mkl               2021.4.0
mpmath            1.2.1
networkx          3.2.1
ninja             1.11.1.1
numpy             1.26.4
packaging         24.0
Pillow            10.1.0
pip               24.0
setuptools        69.5.1
sympy             1.12
tbb               2021.11.0
torch             2.4.0.dev20240606+cu124
torchvision       0.19.0.dev20240606+cu124
typing_extensions 4.8.0
wheel             0.43.0
xformers          0.0.27+66cfba7.d20240609

Also wheel is just 11MB, vs the typical 300+ MB on Windows. Build using 9th June torch Nightly imagen

Build using 28th May torch Nightly imagen

lvaleriu commented 3 months ago

Hi @Panchovix. We have added the option (by default) for xformers to switch for torch FA & CUTLASS implementations (when available/compatible). This reduces the build time (by avoiding building local FA + local FMHA CUTLASS kernels) and the size. There is still of course the option to build kernels locally by running:

XFORMERS_PT_FLASH_ATTN=0 XFORMERS_PT_CUTLASS_ATTN=0 python setup.py develop

We are fixing the torch._C._dispatch_has_kernel_for_dispatch_key issue.

lvaleriu commented 3 months ago

Hi @Panchovix , hi @CoffeeVampir3 - can you check again build on the latest xformers dev? This commit should address the _USE_TORCH_CUTLASS = not torch._C._dispatch_has_kernel_for_dispatchkey error and has been merged to main.

Panchovix commented 3 months ago

@lvaleriu Thanks, now it works fine! But I had to move the include folder from C:\Users\User\AppData\Local\Programs\Python\Python312 into the venv folder that I used to build xformers for some reason.

This is the output

(venv) PS G:\Stable difussion\stable-diffusion-webui> python -m xformers.info
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\flash.py:338: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\triton\softmax.py:30: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\triton\softmax.py:86: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\swiglu_op.py:127: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_fwd
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\swiglu_op.py:148: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @torch.cuda.amp.custom_bwd
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.27+8ce361e.d20240611
memory_efficient_attention.ckF:                    unavailable
memory_efficient_attention.ckB:                    unavailable
memory_efficient_attention.ck_decoderF:            unavailable
memory_efficient_attention.ck_splitKF:             unavailable
memory_efficient_attention.cutlassF:               available
memory_efficient_attention.cutlassB:               available
memory_efficient_attention.decoderF:               available
memory_efficient_attention.flshattF@v2.5.7:        available
memory_efficient_attention.flshattB@v2.5.7:        available
memory_efficient_attention.smallkF:                available
memory_efficient_attention.smallkB:                available
memory_efficient_attention.triton_splitKF:         available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
sequence_parallel_fused.write_values:              available
sequence_parallel_fused.wait_values:               available
sequence_parallel_fused.cuda_memset_32b_async:     available
sp24.sparse24_sparsify_both_ways:                  available
sp24.sparse24_apply:                               available
sp24.sparse24_apply_dense_output:                  available
sp24._sparse24_gemm:                               available
sp24._cslt_sparse_mm@0.0.0:                        available
swiglu.dual_gemm_silu:                             available
swiglu.gemm_fused_operand_sum:                     available
swiglu.fused.p.cpp:                                available
is_triton_available:                               True
pytorch.version:                                   2.4.0.dev20240611+cu124
pytorch.cuda:                                      available
gpu.compute_capability:                            8.9
gpu.name:                                          NVIDIA GeForce RTX 4090
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                1205
build.hip_version:                                 None
build.python_version:                              3.12.3
build.torch_version:                               2.4.0.dev20240611+cu124
build.env.TORCH_CUDA_ARCH_LIST:                    None
build.env.PYTORCH_ROCM_ARCH:                       None
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
build.nvcc_version:                                12.5.40
source.privacy:                                    open source
CoffeeVampir3 commented 3 months ago

Hi @Panchovix , hi @CoffeeVampir3 - can you check again build on the latest xformers dev? This commit should address the _USE_TORCH_CUTLASS = not torch._C._dispatch_has_kernel_for_dispatchkey error and has been merged to main.

I'm getting a different issue now (for flash attention this time, although it would seem it still does not build any extensions)

(base) ➜  ~ export XFORMERS_MORE_DETAILS=1                          
(base) ➜  ~ python -m xformers.info       
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.5.0.dev20240615+cu124 with CUDA 1205 (you have 2.5.0.dev20240615+cu124)
    Python  3.10.14 (you have 3.10.14)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
  File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/_cpp_lib.py", line 132, in _register_extensions
    torch.ops.load_library(ext_specs.origin)
  File "/home/blackroot/miniforge3/lib/python3.10/site-packages/torch/_ops.py", line 1298, in load_library
    ctypes.CDLL(path)
  File "/home/blackroot/miniforge3/lib/python3.10/ctypes/__init__.py", line 374, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/_C.so: undefined symbol: __cxa_call_terminate

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/_cpp_lib.py", line 142, in <module>
    _build_metadata = _register_extensions()
  File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/_cpp_lib.py", line 134, in _register_extensions
    raise xFormersInvalidLibException(build_metadata) from exc
xformers._cpp_lib.xFormersInvalidLibException: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.5.0.dev20240615+cu124 with CUDA 1205 (you have 2.5.0.dev20240615+cu124)
    Python  3.10.14 (you have 3.10.14)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:338: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
  @torch.library.impl_abstract("xformers_flash::flash_bwd")
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/triton/softmax.py:30: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/triton/softmax.py:87: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/swiglu_op.py:128: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/swiglu_op.py:149: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(cls, ctx, dx5):
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.27+96e5222.d20240616
memory_efficient_attention.ckF:                    unavailable
memory_efficient_attention.ckB:                    unavailable
memory_efficient_attention.ck_decoderF:            unavailable
memory_efficient_attention.ck_splitKF:             unavailable
memory_efficient_attention.cutlassF:               unavailable
memory_efficient_attention.cutlassB:               unavailable
memory_efficient_attention.decoderF:               unavailable
memory_efficient_attention.flshattF@2.5.6-pt:      available
memory_efficient_attention.flshattB@2.5.6-pt:      available
memory_efficient_attention.smallkF:                unavailable
memory_efficient_attention.smallkB:                unavailable
memory_efficient_attention.triton_splitKF:         available
indexing.scaled_index_addF:                        available
indexing.scaled_index_addB:                        available
indexing.index_select:                             available
sequence_parallel_fused.write_values:              unavailable
sequence_parallel_fused.wait_values:               unavailable
sequence_parallel_fused.cuda_memset_32b_async:     unavailable
sp24.sparse24_sparsify_both_ways:                  unavailable
sp24.sparse24_apply:                               unavailable
sp24.sparse24_apply_dense_output:                  unavailable
sp24._sparse24_gemm:                               unavailable
sp24._cslt_sparse_mm@0.5.2:                        available
swiglu.dual_gemm_silu:                             unavailable
swiglu.gemm_fused_operand_sum:                     unavailable
swiglu.fused.p.cpp:                                not built
is_triton_available:                               True
pytorch.version:                                   2.5.0.dev20240615+cu124
pytorch.cuda:                                      available
gpu.compute_capability:                            8.6
gpu.name:                                          NVIDIA GeForce RTX 3090 Ti
dcgm_profiler:                                     unavailable
build.info:                                        available
build.cuda_version:                                1205
build.hip_version:                                 None
build.python_version:                              3.10.14
build.torch_version:                               2.5.0.dev20240615+cu124
build.env.TORCH_CUDA_ARCH_LIST:                    8.6
build.env.PYTORCH_ROCM_ARCH:                       None
build.env.XFORMERS_BUILD_TYPE:                     None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS:        None
build.env.NVCC_FLAGS:                              None
build.env.XFORMERS_PACKAGE_FROM:                   None
source.privacy:                                    open source
danthe3rd commented 3 months ago

How did you install PyTorch? Was it from conda or pip?

CoffeeVampir3 commented 3 months ago

How did you install PyTorch? Was it from conda or pip?

Pip (no virtual environments of any kind)

danthe3rd commented 3 months ago

Unfortunately I didn't manage to repro (Linux, Python 3.10, torch installed via pip for cu124). Not sure what's the difference in setup there...

CoffeeVampir3 commented 3 months ago

Could this be an issue with NVCC being on 12.5?

(base) ➜  ~ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
(base) ➜  ~ gcc --version                                            
gcc (GCC) 14.1.1 20240522
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I had to fix some bugs in candle related to the NVCC compiler's PTX versioning choice, but I am much less familiar with xformers.

Auroir commented 3 weeks ago

E: Issue was likely unrelated.