Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.5k stars 1.24k forks source link

Flash Attention 2 doesn't get built/compiles on Windows. #553

Closed Panchovix closed 11 months ago

Panchovix commented 1 year ago

Hi there, impressive work. Tested in on Linux and the VRAM and speeds with higher context is impressive (tested on exllamav2)

I've tried to do the same on Windows for exllamav2, but I have issues when either compiling or building from source.

I tried with:

Torch 2.0.1+cu118 and CUDA 11.8 Torch 2.2+cu121 and CUDA 12.1 Visual Studio 2022

The errors are these, based on if doing python setup.py install from source or doing it via pip.

Compiling from source error ``` [2/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 flash_bwd_hdim160_fp16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim160_fp16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim160_fp16_sm80.cu F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion printf("GmmaDescriptor: 0x%016 %lli\n", static_cast(t.desc_)); ^ Remark: The warnings can be suppressed with "-diag-suppress " tmpxft_00003160_00000000-7_flash_bwd_hdim160_fp16_sm80.cudafe1.cpp F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' [3/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 flash_bwd_hdim160_bf16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim160_bf16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim160_bf16_sm80.cu F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion printf("GmmaDescriptor: 0x%016 %lli\n", static_cast(t.desc_)); ^ Remark: The warnings can be suppressed with "-diag-suppress " tmpxft_00005ccc_00000000-7_flash_bwd_hdim160_bf16_sm80.cudafe1.cpp F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim160_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim160(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(270): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' [4/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 flash_bwd_hdim192_bf16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim192_bf16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim192_bf16_sm80.cu F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion printf("GmmaDescriptor: 0x%016 %lli\n", static_cast(t.desc_)); ^ Remark: The warnings can be suppressed with "-diag-suppress " tmpxft_000038c0_00000000-7_flash_bwd_hdim192_bf16_sm80.cudafe1.cpp F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' [5/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim192_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 flash_bwd_hdim192_fp16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim192_fp16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim192_fp16_sm80.cu F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion printf("GmmaDescriptor: 0x%016 %lli\n", static_cast(t.desc_)); ^ Remark: The warnings can be suppressed with "-diag-suppress " tmpxft_00002c68_00000000-7_flash_bwd_hdim192_fp16_sm80.cudafe1.cpp F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim192_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim192(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(287): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' [6/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_fp16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 flash_bwd_hdim128_fp16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim128_fp16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim128_fp16_sm80.cu F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion printf("GmmaDescriptor: 0x%016 %lli\n", static_cast(t.desc_)); ^ Remark: The warnings can be suppressed with "-diag-suppress " tmpxft_000030a8_00000000-7_flash_bwd_hdim128_fp16_sm80.cudafe1.cpp F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_fp16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' [7/49] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 FAILED: F:/ChatIAs/oobabooga/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin\nvcc --generate-dependencies-with-compile --dependency-output F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d -std=c++17 --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /wd4624 -Xcompiler /wd4067 -Xcompiler /wd4068 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn -IF:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src -IF:\ChatIAs\oobabooga\flash-attention\csrc\cutlass\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\torch\csrc\api\include -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\TH -IF:\ChatIAs\oobabooga\venv\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" -IF:\ChatIAs\oobabooga\venv\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\include -IC:\Users\Pancho\AppData\Local\Programs\Python\Python310\Include "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.36.32532\ATLMFC\include" "-IC:\Program Files\Microsoft Visual Studio\2022\Community\VC\Auxiliary\VS\include" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22000.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\um" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\shared" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\winrt" "-IC:\Program Files (x86)\Windows Kits\10\\include\10.0.22000.0\\cppwinrt" -c F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o F:\ChatIAs\oobabooga\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 flash_bwd_hdim128_bf16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim128_bf16_sm80.cu cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_OPERATORS__' con '/U__CUDA_NO_HALF_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF_CONVERSIONS__' con '/U__CUDA_NO_HALF_CONVERSIONS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_HALF2_OPERATORS__' con '/U__CUDA_NO_HALF2_OPERATORS__' cl : Línea de comandos warning D9025 : invalidando '/D__CUDA_NO_BFLOAT16_CONVERSIONS__' con '/U__CUDA_NO_BFLOAT16_CONVERSIONS__' flash_bwd_hdim128_bf16_sm80.cu F:/ChatIAs/oobabooga/flash-attention/csrc/cutlass/include\cute/arch/mma_sm90_desc.hpp(143): warning #226-D: invalid format string conversion printf("GmmaDescriptor: 0x%016 %lli\n", static_cast(t.desc_)); ^ Remark: The warnings can be suppressed with "-diag-suppress " tmpxft_0000556c_00000000-7_flash_bwd_hdim128_bf16_sm80.cudafe1.cpp F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu(9): note: Vea la referencia a la creación de una instancia de la función plantilla "void run_mha_bwd_hdim128(Flash_bwd_params &,cudaStream_t,const bool)" que se está compilando F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\flash_bwd_launch_template.h(235): error C2975: "kHeadDim_": argumento de plantilla no válido para "Flash_bwd_kernel_traits"; se esperaba una expresión constante en tiempo de compilación F:\ChatIAs\oobabooga\flash-attention\csrc\flash_attn\src\kernel_traits.h(186): note: vea la declaración de 'kHeadDim_' ninja: build stopped: subcommand failed. Traceback (most recent call last): File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build subprocess.run( File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "F:\ChatIAs\oobabooga\flash-attention\setup.py", line 287, in setup( File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup return distutils.core.setup(**attrs) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup return run_commands(dist) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands dist.run_commands() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands self.run_command(cmd) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command super().run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command cmd_obj.run() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 74, in run self.do_egg_install() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install.py", line 123, in do_egg_install self.run_command('bdist_egg') File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command self.distribution.run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command super().run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command cmd_obj.run() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 165, in run cmd = self.call_command('install_lib', warn_dir=0) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\bdist_egg.py", line 151, in call_command self.run_command(cmdname) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command self.distribution.run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command super().run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command cmd_obj.run() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\install_lib.py", line 11, in run self.build() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\install_lib.py", line 112, in build self.run_command('build_ext') File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command self.distribution.run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command super().run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command cmd_obj.run() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run _build_ext.run(self) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run self.build_extensions() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions build_ext.build_extensions(self) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions self._build_extensions_serial() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial self.build_extension(ext) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension _build_ext.build_extension(self, ext) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension objects = self.compiler.compile( File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects _run_ninja_build( File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension ```
Panchovix commented 1 year ago
Install from pip error ``` ninja: build stopped: subcommand failed. Traceback (most recent call last): File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 257, in run urllib.request.urlretrieve(wheel_url, wheel_filename) File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 241, in urlretrieve with contextlib.closing(urlopen(url, data)) as fp: File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen return opener.open(url, data, timeout) File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 525, in open response = meth(req, response) File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 634, in http_response response = self.parent.error( File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 563, in error return self._call_chain(*args) File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain result = func(*args) File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found During handling of the above exception, another exception occurred: Traceback (most recent call last): File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2100, in _run_ninja_build subprocess.run( File "C:\Users\Pancho\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 526, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '6']' returned non-zero exit status 1. The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 2, in File "", line 34, in File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 277, in setup( File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\__init__.py", line 87, in setup return distutils.core.setup(**attrs) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 185, in setup return run_commands(dist) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\core.py", line 201, in run_commands dist.run_commands() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 968, in run_commands self.run_command(cmd) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command super().run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command cmd_obj.run() File "C:\Users\Pancho\AppData\Local\Temp\pip-install-0q3amvk2\flash-attn_1ac95a7d9f7749dd90e6733135f93c62\setup.py", line 274, in run super().run() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\wheel\bdist_wheel.py", line 343, in run self.run_command("build") File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command self.distribution.run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command super().run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command cmd_obj.run() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build.py", line 132, in run self.run_command(cmd_name) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\cmd.py", line 319, in run_command self.distribution.run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\dist.py", line 1217, in run_command super().run_command(command) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\dist.py", line 987, in run_command cmd_obj.run() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 84, in run _build_ext.run(self) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 346, in run self.build_extensions() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 873, in build_extensions build_ext.build_extensions(self) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 466, in build_extensions self._build_extensions_serial() File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 492, in _build_extensions_serial self.build_extension(ext) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\command\build_ext.py", line 246, in build_extension _build_ext.build_extension(self, ext) File "F:\ChatIAs\oobabooga\venv\lib\site-packages\setuptools\_distutils\command\build_ext.py", line 547, in build_extension objects = self.compiler.compile( File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 845, in win_wrap_ninja_compile _write_ninja_file_and_compile_objects( File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 1774, in _write_ninja_file_and_compile_objects _run_ninja_build( File "F:\ChatIAs\oobabooga\venv\lib\site-packages\torch\utils\cpp_extension.py", line 2116, in _run_ninja_build raise RuntimeError(message) from e RuntimeError: Error compiling objects for extension [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects ```

Is there any additional requisite besides the mentioned, to install flash-attn on Windows?

tridao commented 1 year ago

I've no idea since it's only been tested on Linux, and I don't have access to a Windows machine. If you figure out how to build on Windows (or what we need to change to support Windows), please lmk.

Panchovix commented 11 months ago

Closing as https://github.com/Dao-AILab/flash-attention/commit/5a834254428fbdc2371ffb23a9cde40a287a7ff6 fixes it.

grimulkan commented 11 months ago

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

Panchovix commented 11 months ago

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

Yes, now it is possible. Latest pull should work. You do need CUDA 12.x though, since CUDA 11.8 and lower don't support it.

I've uploaded a wheel here https://huggingface.co/Panchovix/flash-attn-2-windows-test-wheel

More discussion here: https://github.com/Dao-AILab/flash-attention/issues/595

grimulkan commented 11 months ago

Thanks, 11.8 was my error. Woohoo!

rocketpoweryul commented 7 months ago

@Panchovix are you saying we can now compile flash-attn on Windows somehow? I couldn't with the latest pull, unless I'm missing something.

Yes, now it is possible. Latest pull should work. You do need CUDA 12.x though, since CUDA 11.8 and lower don't support it.

I've uploaded a wheel here https://huggingface.co/Panchovix/flash-attn-2-windows-test-wheel

More discussion here: https://github.com/Dao-AILab/flash-attention/issues/595

The link gives a 404 now

grimulkan commented 7 months ago

There are binaries here. I can't build anything beyond 2.4.2 from source myself and can't find Windows binaries beyond that anywhere. 2.4.2 works fine with current packages though.

Adlinga commented 5 months ago

With some untraceable magic I've built 2.5.6 on windows 10. It took ~2.5 hours for compiling.

Cuda 12.4 Torch 2.2.2+cu121 ninja 1.11.1

SavorSauc3 commented 5 months ago

For anyone looking to use Flash Attention on Windows, I got it working after some tweaking. You have to make sure that Cuda 12.4 is installed, and PyTorch should be 2.2.2+cu121. I used pip and it took about 2 hours to finish setup. Hope this helps anyone who wants to use flash-attn on Windows. BTW I am using windows 11 pro, mileage may vary on Windows 10.

sadimoodi commented 5 months ago

For anyone looking to use Flash Attention on Windows, I got it working after some tweaking. You have to make sure that Cuda 12.4 is installed, and PyTorch should be 2.2.2+cu121. I used pip and it took about 2 hours to finish setup. Hope this helps anyone who wants to use flash-attn on Windows. BTW I am using windows 11 pro, mileage may vary on Windows 10.

have seen significant improvements after using flash attention? how much ?

SavorSauc3 commented 5 months ago

I was able to get it working. The problem seems to be that many ml frameworks don’t support flash attention on windows. You would have to do tests for yourself, but it seems like ctransformers does use it. Since I didn’t check the performance before installing flash attention, I couldn’t say what the improvements were.

On Mon, Apr 8, 2024 at 11:56 PM sadimoodi @.***> wrote:

For anyone looking to use Flash Attention on Windows, I got it working after some tweaking. You have to make sure that Cuda 12.4 is installed, and PyTorch should be 2.2.2+cu121. I used pip and it took about 2 hours to finish setup. Hope this helps anyone who wants to use flash-attn on Windows. BTW I am using windows 11 pro, mileage may vary on Windows 10.

have seen significant improvements after using flash attention? how much ?

— Reply to this email directly, view it on GitHub https://github.com/Dao-AILab/flash-attention/issues/553#issuecomment-2044104174, or unsubscribe https://github.com/notifications/unsubscribe-auth/A42G35XPZ52I7HND4ZCNSODY4NRF7AVCNFSM6AAAAAA45N5VROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBUGEYDIMJXGQ . You are receiving this because you commented.Message ID: @.***>

grimulkan commented 5 months ago

Got it working on Windows 10 as well on Torch 2.2.2 (with Cuda 12.4 installed). Took around 15-20 min to compile on a 64-core threadripper with Ninja, so it does scale well with compute.

dr4gos-pop commented 5 months ago

Version 2.5.7 working on my Windows 10, building took around 2h:

pip install flash-attn --no-build-isolation Collecting flash-attn Using cached flash_attn-2.5.7.tar.gz (2.5 MB) Preparing metadata (setup.py) ... done Requirement already satisfied: torch in (from flash-attn) (2.2.2+cu121) Requirement already satisfied: einops in (from flash-attn) (0.7.0) Requirement already satisfied: packaging in (from flash-attn) (24.0) Requirement already satisfied: ninja in (from flash-attn) (1.11.1.1) Requirement already satisfied: filelock in (from torch->flash-attn) (3.13.3) Requirement already satisfied: typing-extensions>=4.8.0 in (from torch->flash-attn) (4.11.0) Requirement already satisfied: sympy in (from torch->flash-attn) (1.12) Requirement already satisfied: networkx in (from torch->flash-attn) (2.8.8) Requirement already satisfied: jinja2 in (from torch->flash-attn) (3.1.3) Requirement already satisfied: fsspec in (from torch->flash-attn) (2024.3.1) Requirement already satisfied: MarkupSafe>=2.0 in (from jinja2->torch->flash-attn) (2.1.5) Requirement already satisfied: mpmath>=0.19 in (from sympy->torch->flash-attn) (1.3.0) Building wheels for collected packages: flash-attn Building wheel for flash-attn (setup.py) ... done Created wheel for flash-attn: filename=flash_attn-2.5.7-cp311-cp311-win_amd64.whl size=117462147 Stored in directory: c:\users\appdata\local\pip\cache\wheels\94\a7\df\cf319d566d2bb53c7f3dd1b15ab2736cabca3e6410c75bd206 Successfully built flash-attn Installing collected packages: flash-attn Successfully installed flash-attn-2.5.7

LostRuins commented 4 months ago

Any luck getting it to work with cuda 11.8?

d-kleine commented 3 months ago

(...) building took around 2h:

A package that needs 2 hours to install? Sorry, but that's a no-go for me. Any ways to speed up this up in the future? Maybe as an installer instead of a package?

grimulkan commented 3 months ago

A package that needs 2 hours to install? Sorry, but that's a no-go for me. Any ways to speed up this up in the future? Maybe as an installer instead of a package?

Well it doesn't take that long if you have a multi-core processor (it's the compile time). In general you're right, someone should maintain pre-built wheels, and someone usually does, but it's not consistent for Windows builds right now and you have to search GitHub for someone who has uploaded a recent build.

The good news is FA2 is a pretty stable product right now I think, and you can grab an older wheel and it'll probably work just as well, as long as it supports the CUDA version you're using.

Any luck getting it to work with cuda 11.8?

I tried but it would not compile. Might be one of the dependencies (like cutlass?) needs 12.0.

hananbeer commented 1 month ago

are there more recent builds for windows? I get the same error.

and for the 2.4.2 binaries I get this error:

ImportError: DLL load failed while importing flash_attn_2_cuda: The specified procedure could not be found."

grimulkan commented 1 month ago

https://github.com/bdashore3/flash-attention/releases

hananbeer commented 1 month ago

https://github.com/bdashore3/flash-attention/releases

thanks for quick reply. unfortunately the same error persists with these builds. maybe something in my PATH is missing as I did get strange C++ build tools errors I managed to workaround but perhaps not completely fix... prebuilt is better of course.

I have cuda 12.4 btw and these say cu123... hmm.

grimulkan commented 1 month ago

That should be fine, technically. CUDA libs are generally backwards compatible, as long as your torch also has a compatible CUDA build. Does the latest pre-built wheel work? I do get the error you’re getting if I use a newer package with an older flash-attn wheel, or build an older version of flash-attn. Maybe some non-compatible change that was never reported in Windows. But most recent build or wheel of flash-attn removes that error for me.

hananbeer commented 1 month ago

I finally found a root cause for the build fail.

https://stackoverflow.com/a/78576792/13305027

I don't understand the VS 2022 version thing because that's what I have installed, but apparently it is related to some minor version it wasn't entirely clear how to downgrade to another 2022 version, so perhaps installing <2022 would suffice.

alternatively upgrade to cuda 12.4, preferably 12.5 it seems.

I am now testing a different approach to fix support without reinstallation. pip calls setup.py script which calls pytorch cpp_extension.py which builds using ninja which calls nvcc... and using --allow-unsupported-compiler should workaround the issue.

ps. perhaps worth mentioning on WSL it's pretty much hassle free, except there I had an error related to flash-attn, so on that other project I could simply bypass flash-attn by finding and setting use_flash_attn = False in the code. (might be something similar for you)

grimulkan commented 1 month ago

That makes sense, the cases when I got that error I was probably linking with CUDA 12.1, and in my recent builds I had switched to 12.5. Also have a very early version of 2022 and have never updated since it was first released.

What was the issue with WSL? It seems to work fine for me.

hananbeer commented 1 month ago

I had this error: https://github.com/facebookresearch/segment-anything-2/issues/100

I ended up using the same type of solution they proposed which is to bypass flash attn altogether.

perhaps this is the case for anyone reading this thread, but not so helpful if you actually need flash attn.

I'm not sure what the implications of this would be but that repo seemed to work without it. maybe you have some insights?

the-xentropy commented 3 weeks ago

I suspect that this is caused by version differences and how absurdly easy the import paths get messed up on Windows, and ultimately caused by that with Windows unless you're using Conda you really need to figure out yourself which versions are compatible, and even then you need to know to install things in the right order.

What worked for me, unintuitive things in bold:

1) Uninstalll pytorch, torchvision, xformers & torchaudio 2) Uninstall all MSVC C++ build tools 3) Uninstall all Cuda, Cuda Toolkit, CUDNN, and other Nvidia SDKs (read: type 'nvidia', 'cudnn' and 'cuda' into the add/remove programs feature and remove anything that isn't geforce experience or drivers) 4) Restart 5) Install MSVC C++ build tools (I have Visual Studio Community 2022, 17.11.1, the most recent one, and I also added MSVC v143 build tools for v17.9 6) Install all the CUDA things. I went for Cuda 12.4.1 and CUDNN 9.2.1. Do NOT install this first. The Cuda toolkit HAS to configure the MSVC setup! 7) Install pytorch (2.4.1, torchaudio 2.4.1 and torchvision 0.19) 8) Restart (yes, unlike a lot of guides that say you have to, you actually have to. It will not work otherwise. I tried)

TL;DR: Pay super close attention to which versions are installed all over your system, and consider doing a clean re-install of CUDA stuff.

As for easing this going forward, I think adding some sanity checks in the build process to see which versions are installed, if the include paths are sensible, etc, and if they make sense would be a good step. As a 'crash early' mitigation, maybe we could do a quick build of some Cuda hello world before kicking off the main process? As long as the program isn't too trivial I think it's highly likely to catch build misconfigurations.

sunsetcoder commented 3 weeks ago

I followed steps 1-4 (made sure to remove all CUDA / CuDNN from Add/Remove programs - only the Geforce drivers & Geforce experience remained).

Installed the Latest Microsoft Visual C++ Redistributable Version after step 8 to fix "OSError WinError 126, error loading fbgemm.dll or dependencies]" (occured when running "import pytorch")

Installed CUDA 12.4.1

Windows 11: cuDNN 9.2 was installed from the tarball:

I created a new venv and installed PyTorch 2.4 by modifying Step 7:

Finally:

It's currently building (with a lot of warnings in the process, such as \flash_bwd_kernel.h(483): warning #177-D: variable "dtanh" was declared but never referenced)

sunsetcoder commented 3 weeks ago

Build completed . Created a .whl file with python setup.py bdist_wheel:

image
evilalmus commented 2 days ago

@sunsetcoder Thank you!, I've been trying to get flash_attn installed for days, these instructions are the first ones that worked.

sunsetcoder commented 2 days ago

@evilalmus You're welcome. Make sure to use Python 3.10. 3.12 no bueno

evilalmus commented 2 days ago

3.11.9 worked for me.