Open CoffeeVampir3 opened 3 months ago
Can confirm the issue on Windows, on both Python 3.11 and 3.12 since May 29th torch nightly build.
win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /e /i C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /e /i C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.lowSpeedLimit 0 git config --global http.lowSpeedTime 3600 7. git config --global http.postBuffer 2G git config http.postBuffer 8. git config --system core.longpaths true 9. python -m pip install --upgrade pip Successfully installed pip-24.0 10. python -m pip install ninja 11. python -m pip install wheel setuptools 12. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609
win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "C:\Program Files\ffmpeg\bin;%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.postBuffer 2G 7. git config --system core.longpaths true 8. python -m pip install --upgrade pip Successfully installed pip-24.0 9. python -m pip install ninja 10. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609
Could you share your output of:
python -m xformers.info
Tried replicating as closely as possible, but I personally have the same error.
win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.postBuffer 2G 7. git config --system core.longpaths true 8. python -m pip install --upgrade pip Successfully installed pip-24.0 9. python -m pip install ninja 10. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609
This seems to be able to build xformers but without flash attention, so you can't use it on A1111 webui for example.
win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "C:\Program Files\ffmpeg\bin;%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.postBuffer 2G 7. git config --system core.longpaths true 8. python -m pip install --upgrade pip Successfully installed pip-24.0 9. python -m pip install ninja 10. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609
Could you share your output of:
python -m xformers.info
Tried replicating as closely as possible, but I personally have the same error.
my PC:
D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\xformers>..\python -m xformers.info
D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\fmha\flash.py:231: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\fmha\flash.py:358: FutureWarning: torch.library.impl_abstract
was renamed to torch.library.register_fake
. Please use that instead; we will remove torch.library.impl_abstract
in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\swiglu_op.py:127: FutureWarning: torch.cuda.amp.custom_fwd(args...)
is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda')
instead.
@torch.cuda.amp.custom_fwd
D:+AI\ComfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\swiglu_op.py:148: FutureWarning: torch.cuda.amp.custom_bwd(args...)
is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda')
instead.
@torch.cuda.amp.custom_bwd
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.27+66cfba7.d20240609
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF-pt: available
memory_efficient_attention.cutlassB-pt: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@2.5.6-pt: available
memory_efficient_attention.flshattB@2.5.6-pt: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
sequence_parallel_fused.write_values: available
sequence_parallel_fused.wait_values: available
sequence_parallel_fused.cuda_memset_32b_async: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm@0.0.0: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: False
pytorch.version: 2.4.0.dev20240606+cu124
pytorch.cuda: available
gpu.compute_capability: 6.1
gpu.name: Quadro P4000
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1204
build.hip_version: None
build.python_version: 3.11.8
build.torch_version: 2.4.0.dev20240606+cu124
build.env.TORCH_CUDA_ARCH_LIST: None
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 12.4.131
source.privacy: open source
my laptop: (No GPU)
D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded>python -m xformers.info
D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\swiglu_op.py:127: FutureWarning: torch.cuda.amp.custom_fwd(args...)
is deprecated. Please use torch.amp.custom_fwd(args..., device_type='cuda')
instead.
@torch.cuda.amp.custom_fwd
D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\Lib\site-packages\xformers\ops\swiglu_op.py:148: FutureWarning: torch.cuda.amp.custom_bwd(args...)
is deprecated. Please use torch.amp.custom_bwd(args..., device_type='cuda')
instead.
@torch.cuda.amp.custom_bwd
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.27+66cfba7.d20240609
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF-pt: available
memory_efficient_attention.cutlassB-pt: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@0.0.0: unavailable
memory_efficient_attention.flshattB@0.0.0: unavailable
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.triton_splitKF: unavailable
indexing.scaled_index_addF: unavailable
indexing.scaled_index_addB: unavailable
indexing.index_select: unavailable
sequence_parallel_fused.write_values: available
sequence_parallel_fused.wait_values: available
sequence_parallel_fused.cuda_memset_32b_async: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm@0.0.0: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: False
pytorch.version: 2.4.0.dev20240606+cu124
pytorch.cuda: not available
dcgm_profiler: unavailable
build.info: available
build.cuda_version: None
build.hip_version: None
build.python_version: 3.11.8
build.torch_version: 2.4.0.dev20240606+cu124
build.env.TORCH_CUDA_ARCH_LIST: None
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
source.privacy: open source
win11: cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded 1. python -m pip install --pre --upgrade torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 Successfully installed torch-2.4.0.dev20240606+cu124 torchaudio-2.2.0.dev20240606+cu124 torchvision-0.19.0.dev20240606+cu124 2. from : https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/ , download & install the first Option C++ 3. setx /m "Path" "%path%;C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.39.33519\bin\Hostx64\x64" ( Please confirm your own version and path ) where cl.exe 4. Install python 3.11.8 for the operating system where python.exe C:\Users\user\AppData\Local\Programs\Python\Python311\python.exe xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\include D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\include xcopy /s C:\Users\user\AppData\Local\Programs\Python\Python311\libs D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded\libs 5. git -v & git update-git-for-windows git version 2.45.2.windows.1 6. git config --global http.postBuffer 2G 7. git config --system core.longpaths true 8. python -m pip install --upgrade pip Successfully installed pip-24.0 9. python -m pip install ninja 10. python -m pip install -v -U git+https://github.com/facebookresearch/xformers.git@main#egg=xformers Successfully installed xformers-0.0.27+66cfba7.d20240609
This seems to be able to build xformers but without flash attention, so you can't use it on A1111 webui for example.
If the network is not good, you can step by step: (Carefully read the output information, where you need git clone, manually help clone well in advance.) cd D:+AI\comfyUI\ComfyUI_windows_portable\python_embeded To set the timeout period in seconds: git config http.lowSpeedLimit 0 git config http.lowSpeedTime 3600 git clone https://github.com/facebookresearch/xformers.git cd xformers\third_party git clone https://github.com/Dao-AILab/flash-attention.git/ cd flash-attention/csrc git clone https://github.com/NVIDIA/cutlass.git cd .. & cd .. & cd .. git submodule update --init --recursive ..\python -m pip install ninja ..\python -m pip install wheel setuptools ..\python -m pip install .
Processing d:+ai\comfyui\comfyui_windows_portable\python_embeded\xformers Preparing metadata (setup.py) ... done Requirement already satisfied: torch>=2.2 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from xformers==0.0.27+66cfba7.d20240609) (2.4.0.dev20240606+cu124) Requirement already satisfied: numpy in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from xformers==0.0.27+66cfba7.d20240609) (1.26.4) Requirement already satisfied: filelock in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (3.13.1) Requirement already satisfied: typing-extensions>=4.8.0 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (4.10.0) Requirement already satisfied: sympy in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (1.12) Requirement already satisfied: networkx in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (3.2.1) Requirement already satisfied: jinja2 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (3.1.3) Requirement already satisfied: fsspec in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2024.2.0) Requirement already satisfied: mkl<=2021.4.0,>=2021.1.1 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2021.4.0) Requirement already satisfied: intel-openmp==2021. in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from mkl<=2021.4.0,>=2021.1.1->torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2021.4.0) Requirement already satisfied: tbb==2021. in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from mkl<=2021.4.0,>=2021.1.1->torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2021.12.0) Requirement already satisfied: MarkupSafe>=2.0 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from jinja2->torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (2.1.5) Requirement already satisfied: mpmath>=0.19 in d:+ai\comfyui\comfyui_windows_portable\python_embeded\lib\site-packages (from sympy->torch>=2.2->xformers==0.0.27+66cfba7.d20240609) (1.3.0) Building wheels for collected packages: xformers Building wheel for xformers (setup.py) ... done Created wheel for xformers: filename=xformers-0.0.27+66cfba7.d20240609-cp311-cp311-win_amd64.whl size=8851842 sha256=01ef61832306328f345d04c781d7ba512d4e8b76e0f857d1042afa9c1233f949 Stored in directory: C:\Users\user\AppData\Local\Temp\pip-ephem-wheel-cache-8hqkppk7\wheels\2e\8a\b3\4abaaea64fba5a483ecb34cbc60212a1f0bf7e817ddc6e9894 Successfully built xformers Installing collected packages: xformers Successfully installed xformers-0.0.27+66cfba7.d20240609
Hi, Thanks for the report, we will have a look. In the meantime, you can use this commit and everything should work: https://github.com/facebookresearch/xformers/commit/a40ca6e4a9aeb2093d7a03c5ae2a9f1215f3c296
cc @lvaleriu
It also looks like xformers can't load the extensions because you have the following message:
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.4.0.dev20240602+cu124 with CUDA 1205 (you have 2.4.0.dev20240602+cu124)
Python 3.10.14 (you have 3.10.14)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
Can you try running this to get more information?
XFORMERS_MORE_DETAILS=1 python -m xformers.info
I'm not OP but when I execute that on Windows (using $env:XFORMERS_MORE_DETAILS=1)
I get the same issue
(venv) PS G:\f\xformers_cu124_nightly_py312_08-06-24> $env:XFORMERS_MORE_DETAILS=1
(venv) PS G:\f\xformers_cu124_nightly_py312_08-06-24> python -m xformers.info
WARNING[XFORMERS]: Need to compile C++ extensions to use all xFormers features.
Please install xformers properly (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\_cpp_lib.py", line 138, in <module>
_build_metadata = _register_extensions()
^^^^^^^^^^^^^^^^^^^^^^
File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\_cpp_lib.py", line 123, in _register_extensions
raise xFormersWasNotBuiltException()
xformers._cpp_lib.xFormersWasNotBuiltException: Need to compile C++ extensions to use all xFormers features.
Please install xformers properly (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\info.py", line 11, in <module>
from . import __version__, _cpp_lib, _is_opensource, _is_triton_available, ops
File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\ops\__init__.py", line 8, in <module>
from .fmha import (
File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\ops\fmha\__init__.py", line 12, in <module>
from . import ck, ck_decoder, ck_splitk, cutlass, decoder, flash, small_k, triton_splitk
File "G:\f\xformers_cu124_nightly_py312_08-06-24\xformers\ops\fmha\cutlass.py", line 160, in <module>
USE_TORCH_CUTLASS = not torch._C._dispatch_has_kernel_for_dispatch_key(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator xformers::efficient_attention_forward_cutlass does not exist
pip list
Package Version
----------------- ------------------------
einops 0.8.0
filelock 3.13.1
fsspec 2024.2.0
intel-openmp 2021.4.0
Jinja2 3.1.3
MarkupSafe 2.1.5
mkl 2021.4.0
mpmath 1.2.1
networkx 3.2.1
ninja 1.11.1.1
numpy 1.26.4
packaging 24.0
Pillow 10.1.0
pip 24.0
setuptools 69.5.1
sympy 1.12
tbb 2021.11.0
torch 2.4.0.dev20240606+cu124
torchvision 0.19.0.dev20240606+cu124
typing_extensions 4.8.0
wheel 0.43.0
xformers 0.0.27+66cfba7.d20240609
Also wheel is just 11MB, vs the typical 300+ MB on Windows. Build using 9th June torch Nightly
Build using 28th May torch Nightly
Hi @Panchovix. We have added the option (by default) for xformers to switch for torch FA & CUTLASS implementations (when available/compatible). This reduces the build time (by avoiding building local FA + local FMHA CUTLASS kernels) and the size. There is still of course the option to build kernels locally by running:
XFORMERS_PT_FLASH_ATTN=0 XFORMERS_PT_CUTLASS_ATTN=0 python setup.py develop
We are fixing the torch._C._dispatch_has_kernel_for_dispatch_key issue.
Hi @Panchovix , hi @CoffeeVampir3 - can you check again build on the latest xformers dev? This commit should address the _USE_TORCH_CUTLASS = not torch._C._dispatch_has_kernel_for_dispatchkey error and has been merged to main.
@lvaleriu Thanks, now it works fine! But I had to move the include folder from C:\Users\User\AppData\Local\Programs\Python\Python312
into the venv folder that I used to build xformers for some reason.
This is the output
(venv) PS G:\Stable difussion\stable-diffusion-webui> python -m xformers.info
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\fmha\flash.py:338: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\triton\softmax.py:30: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\triton\softmax.py:86: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
@custom_bwd
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\swiglu_op.py:127: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd
G:\Stable difussion\stable-diffusion-webui\venv\Lib\site-packages\xformers\ops\swiglu_op.py:148: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_bwd
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.27+8ce361e.d20240611
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF: available
memory_efficient_attention.cutlassB: available
memory_efficient_attention.decoderF: available
memory_efficient_attention.flshattF@v2.5.7: available
memory_efficient_attention.flshattB@v2.5.7: available
memory_efficient_attention.smallkF: available
memory_efficient_attention.smallkB: available
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sequence_parallel_fused.write_values: available
sequence_parallel_fused.wait_values: available
sequence_parallel_fused.cuda_memset_32b_async: available
sp24.sparse24_sparsify_both_ways: available
sp24.sparse24_apply: available
sp24.sparse24_apply_dense_output: available
sp24._sparse24_gemm: available
sp24._cslt_sparse_mm@0.0.0: available
swiglu.dual_gemm_silu: available
swiglu.gemm_fused_operand_sum: available
swiglu.fused.p.cpp: available
is_triton_available: True
pytorch.version: 2.4.0.dev20240611+cu124
pytorch.cuda: available
gpu.compute_capability: 8.9
gpu.name: NVIDIA GeForce RTX 4090
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1205
build.hip_version: None
build.python_version: 3.12.3
build.torch_version: 2.4.0.dev20240611+cu124
build.env.TORCH_CUDA_ARCH_LIST: None
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
build.nvcc_version: 12.5.40
source.privacy: open source
Hi @Panchovix , hi @CoffeeVampir3 - can you check again build on the latest xformers dev? This commit should address the _USE_TORCH_CUTLASS = not torch._C._dispatch_has_kernel_for_dispatchkey error and has been merged to main.
I'm getting a different issue now (for flash attention this time, although it would seem it still does not build any extensions)
(base) ➜ ~ export XFORMERS_MORE_DETAILS=1
(base) ➜ ~ python -m xformers.info
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.0.dev20240615+cu124 with CUDA 1205 (you have 2.5.0.dev20240615+cu124)
Python 3.10.14 (you have 3.10.14)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Traceback (most recent call last):
File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/_cpp_lib.py", line 132, in _register_extensions
torch.ops.load_library(ext_specs.origin)
File "/home/blackroot/miniforge3/lib/python3.10/site-packages/torch/_ops.py", line 1298, in load_library
ctypes.CDLL(path)
File "/home/blackroot/miniforge3/lib/python3.10/ctypes/__init__.py", line 374, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/_C.so: undefined symbol: __cxa_call_terminate
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/_cpp_lib.py", line 142, in <module>
_build_metadata = _register_extensions()
File "/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/_cpp_lib.py", line 134, in _register_extensions
raise xFormersInvalidLibException(build_metadata) from exc
xformers._cpp_lib.xFormersInvalidLibException: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.5.0.dev20240615+cu124 with CUDA 1205 (you have 2.5.0.dev20240615+cu124)
Python 3.10.14 (you have 3.10.14)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:211: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_fwd")
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/fmha/flash.py:338: FutureWarning: `torch.library.impl_abstract` was renamed to `torch.library.register_fake`. Please use that instead; we will remove `torch.library.impl_abstract` in a future version of PyTorch.
@torch.library.impl_abstract("xformers_flash::flash_bwd")
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/triton/softmax.py:30: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@custom_fwd(cast_inputs=torch.float16 if _triton_softmax_fp16_enabled else None)
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/triton/softmax.py:87: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/swiglu_op.py:128: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(cls, ctx, x, w1, b1, w2, b2, w3, b3):
/home/blackroot/miniforge3/lib/python3.10/site-packages/xformers/ops/swiglu_op.py:149: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(cls, ctx, dx5):
Unable to find python bindings at /usr/local/dcgm/bindings/python3. No data will be captured.
xFormers 0.0.27+96e5222.d20240616
memory_efficient_attention.ckF: unavailable
memory_efficient_attention.ckB: unavailable
memory_efficient_attention.ck_decoderF: unavailable
memory_efficient_attention.ck_splitKF: unavailable
memory_efficient_attention.cutlassF: unavailable
memory_efficient_attention.cutlassB: unavailable
memory_efficient_attention.decoderF: unavailable
memory_efficient_attention.flshattF@2.5.6-pt: available
memory_efficient_attention.flshattB@2.5.6-pt: available
memory_efficient_attention.smallkF: unavailable
memory_efficient_attention.smallkB: unavailable
memory_efficient_attention.triton_splitKF: available
indexing.scaled_index_addF: available
indexing.scaled_index_addB: available
indexing.index_select: available
sequence_parallel_fused.write_values: unavailable
sequence_parallel_fused.wait_values: unavailable
sequence_parallel_fused.cuda_memset_32b_async: unavailable
sp24.sparse24_sparsify_both_ways: unavailable
sp24.sparse24_apply: unavailable
sp24.sparse24_apply_dense_output: unavailable
sp24._sparse24_gemm: unavailable
sp24._cslt_sparse_mm@0.5.2: available
swiglu.dual_gemm_silu: unavailable
swiglu.gemm_fused_operand_sum: unavailable
swiglu.fused.p.cpp: not built
is_triton_available: True
pytorch.version: 2.5.0.dev20240615+cu124
pytorch.cuda: available
gpu.compute_capability: 8.6
gpu.name: NVIDIA GeForce RTX 3090 Ti
dcgm_profiler: unavailable
build.info: available
build.cuda_version: 1205
build.hip_version: None
build.python_version: 3.10.14
build.torch_version: 2.5.0.dev20240615+cu124
build.env.TORCH_CUDA_ARCH_LIST: 8.6
build.env.PYTORCH_ROCM_ARCH: None
build.env.XFORMERS_BUILD_TYPE: None
build.env.XFORMERS_ENABLE_DEBUG_ASSERTIONS: None
build.env.NVCC_FLAGS: None
build.env.XFORMERS_PACKAGE_FROM: None
source.privacy: open source
How did you install PyTorch? Was it from conda or pip?
How did you install PyTorch? Was it from conda or pip?
Pip (no virtual environments of any kind)
Unfortunately I didn't manage to repro (Linux, Python 3.10, torch installed via pip for cu124). Not sure what's the difference in setup there...
Could this be an issue with NVCC being on 12.5?
(base) ➜ ~ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Wed_Apr_17_19:19:55_PDT_2024
Cuda compilation tools, release 12.5, V12.5.40
Build cuda_12.5.r12.5/compiler.34177558_0
(base) ➜ ~ gcc --version
gcc (GCC) 14.1.1 20240522
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I had to fix some bugs in candle related to the NVCC compiler's PTX versioning choice, but I am much less familiar with xformers.
E: Issue was likely unrelated.
🐛 Bug
To Reproduce
Install torch (direct from https://pytorch.org/get-started/locally/ ) pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124 ^ Follow the steps from (for source compilation) https://github.com/facebookresearch/xformers?tab=readme-ov-file#installing-xformers
Run
python -m xformers.info
Expected behavior
The library compiles extensions.
Environment
Note: A seperate user on windows has confirmed that the newest xformers also fails, however he noted:
it's ded since may 29th version, torch 2.4.0 nightly of 28th may works