Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.37k stars 1.22k forks source link

window11 python3.10 cu117, can't intall on those version #883

Open TaucherLoong opened 6 months ago

TaucherLoong commented 6 months ago

E:.py_users\aiplus\lib\site-packages\torch\utils\cpp_extension.py:359: UserWarning: Error checking compiler version for cl: [WinError 2] 系统找不到指定的文件。 warnings.warn(f'Error checking compiler version for {compiler}: {error}') building 'flash_attn_2_cuda' extension Emitting ninja build file E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\build.ninja... Compiling objects... Using envvar MAX_JOBS (2) as the number of workers... [1/48] C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IE:\workshop\llama_tuner\flash-attention\csrc\flash_attn -IE:\workshop\llama_tuner\flash-attention\csrc\flash_attn\src -IE:\workshop\llama_tuner\flash-attention\csrc\cutlass\include -IE:.py_users\aiplus\lib\site-packages\torch\include -IE:.py_users\aiplus\lib\site-packages\torch\include\torch\csrc\api\include -IE:.py_users\aiplus\lib\site-packages\torch\include\TH -IE:.py_users\aiplus\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include" -IE:.py_users\aiplus\include -Id:\pythonsdk\include -Id:\pythonsdk\Include -ID:\Program_Files\VC\Tools\MSVC\14.33.31629\include -ID:\Program_Files\VC\Auxiliary\VS\include "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c E:\workshop\llama_tuner\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -O3 -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 FAILED: E:/workshop/llama_tuner/flash-attention/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\bin\nvcc --generate-dependencies-with-compile --dependency-output E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj.d --use-local-env -Xcompiler /MD -Xcompiler /wd4819 -Xcompiler /wd4251 -Xcompiler /wd4244 -Xcompiler /wd4267 -Xcompiler /wd4275 -Xcompiler /wd4018 -Xcompiler /wd4190 -Xcompiler /EHsc -Xcudafe --diag_suppress=base_class_has_different_dll_interface -Xcudafe --diag_suppress=field_without_dll_interface -Xcudafe --diag_suppress=dll_interface_conflict_none_assumed -Xcudafe --diag_suppress=dll_interface_conflict_dllexport_assumed -IE:\workshop\llama_tuner\flash-attention\csrc\flash_attn -IE:\workshop\llama_tuner\flash-attention\csrc\flash_attn\src -IE:\workshop\llama_tuner\flash-attention\csrc\cutlass\include -IE:.py_users\aiplus\lib\site-packages\torch\include -IE:.py_users\aiplus\lib\site-packages\torch\include\torch\csrc\api\include -IE:.py_users\aiplus\lib\site-packages\torch\include\TH -IE:.py_users\aiplus\lib\site-packages\torch\include\THC "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.7\include" -IE:.py_users\aiplus\include -Id:\pythonsdk\include -Id:\pythonsdk\Include -ID:\Program_Files\VC\Tools\MSVC\14.33.31629\include -ID:\Program_Files\VC\Auxiliary\VS\include "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\ucrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\um" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\shared" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\winrt" "-IC:\Program Files (x86)\Windows Kits\10\include\10.0.22621.0\cppwinrt" "-IC:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -c E:\workshop\llama_tuner\flash-attention\csrc\flash_attn\src\flash_bwd_hdim128_bf16_sm80.cu -o E:\workshop\llama_tuner\flash-attention\build\temp.win-amd64-cpython-310\Release\csrc/flash_attn/src/flash_bwd_hdim128_bf16_sm80.obj -DCUDA_NO_HALF_OPERATORS -DCUDA_NO_HALF_CONVERSIONS -DCUDA_NO_BFLOAT16_CONVERSIONS -DCUDA_NO_HALF2_OPERATORS --expt-relaxed-constexpr -O3 -std=c++17 -UCUDA_NO_HALF_OPERATORS -UCUDA_NO_HALF_CONVERSIONS -UCUDA_NO_HALF2_OPERATORS -UCUDA_NO_BFLOAT16_CONVERSIONS --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -gencode arch=compute_80,code=sm_80 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 flash_bwd_hdim128_bf16_sm80.cu cl: 命令行 warning D9025 :正在重写“/DCUDA_NO_HALF_OPERATORS”(用“/UCUDA_NO_HALF_OPERATORS”) cl: 命令行 warning D9025 :正在重写“/DCUDA_NO_HALF_CONVERSIONS”(用“/UCUDA_NO_HALF_CONVERSIONS”) cl: 命令行 warning D9025 :正在重写“/DCUDA_NO_HALF2_OPERATORS”(用“/UCUDA_NO_HALF2_OPERATORS”) cl: 命令行 warning D9025 :正在重写“/DCUDA_NO_BFLOAT16_CONVERSIONS”(用“/UCUDA_NO_BFLOAT16_CONVERSIONS”) flash_bwd_hdim128_bf16_sm80.cu E:/.py_users/aiplus/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: 预处理器指令后有意外标记 - 应输入换行符 cl: 命令行 warning D9025 :正在重写“/DCUDA_NO_HALF_OPERATORS”(用“/UCUDA_NO_HALF_OPERATORS”) cl: 命令行 warning D9025 :正在重写“/DCUDA_NO_HALF_CONVERSIONS”(用“/UCUDA_NO_HALF_CONVERSIONS”) cl: 命令行 warning D9025 :正在重写“/DCUDA_NO_HALF2_OPERATORS”(用“/UCUDA_NO_HALF2_OPERATORS”) cl: 命令行 warning D9025 :正在重写“/DCUDA_NO_BFLOAT16_CONVERSIONS”(用“/UCUDA_NO_BFLOAT16_CONVERSIONS”) flash_bwd_hdim128_bf16_sm80.cu E:/.py_users/aiplus/lib/site-packages/torch/include\c10/macros/Macros.h(138): warning C4067: 预处理器指令后有意外标记 - 应输入换行符 E:/.py_users/aiplus/lib/site-packages/torch/include\c10/util/irange.h(54): warning #186-D: pointless comparison of unsigned integer with zero detected during: instantiation of "__nv_bool c10::detail::integer_iterator<I, one_sided, >::operator==(const c10::detail::integer_iterator<I, one_sided, > &) const [with I=size_t, one_sided=false, =0]" (61): here

TaucherLoong commented 6 months ago

the above errors showed up after typing python setup.py install

the last sentence 'RuntimeError: Error compiling objects for extension'

TaucherLoong commented 6 months ago

oh, the torch is 2.0.1

SkyblueMr commented 6 months ago

放弃吧,Github上有人成功编译出的whl包,cuda最低版本是12.1,没见过有比这个版本更低的包了-.-

SkyblueMr commented 6 months ago

https://github.com/jllllll/flash-attention/releases

HongLouyemeng commented 1 month ago

I am also 117, do you have a solution