Might be a solution to get built/compiles Flash Attention 2 on Windows

Akatsuki030 commented 11 months ago

As a Windows user, I tried to compile this and found the problem was on these two files "flash_fwd_launch_template.h" and "flash_bwd_launch_template.h". below "./flash-attention/csrc/flash_attn/src". While the template tried to reference the variable"Headdim", it caused error C2975. I think this might be the reason why we always get compile errors on the Windows system. Below is how I solve this problem:

First, in the file "flash_bwd_launch_template.h", you can find many functions like "run_mha_bwd_hdimXX", also the constant declaration "Headdim == XX", and some templates like this: run_flash_bwd<Flash_bwd_kernel_traits<Headdim, 64, 128, 8, 4, 2, 2, false, false, T>, Is_dropout>(params, stream, configure), the thing I did is change all the "Headdim" in these templates in the function. Take an example, if the function called run_mha_bwd_hdim128 and has a constant declaration "Headdim == 128", you have to change Headdim as 128 in the templates, which likes run_flash_bwd<Flash_bwd_kernel_traits<128, 64, 128, 8, 2, 4, 2, false, false, T>, Is_dropout>(params, stream, configure), and I did the same thing to the functions "run_mha_fwd_hdimXX" and also the templates.

Second, another error is from the "flash_fwd_launch_template.h", line 107, also the problem of referencing the constant "kBlockM" in the below if-else statement, and I rewrote it to

        if constexpr(Kernel_traits::kHeadDim % 128 == 0){
            dim3 grid_combine((params.b * params.h * params.seqlen_q + 4 - 1) / 4);
            BOOL_SWITCH(is_even_K, IsEvenKConst, [&] {
                if (params.num_splits <= 2) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 4, 1, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 4) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 4, 2, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 8) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 4, 3, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 16) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 4, 4, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 32) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 4, 5, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 64) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 4, 6, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 128) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 4, 7, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                }
                C10_CUDA_KERNEL_LAUNCH_CHECK();
            });
        }else if constexpr(Kernel_traits::kHeadDim % 64 == 0){
            dim3 grid_combine((params.b * params.h * params.seqlen_q + 8 - 1) / 8);
            BOOL_SWITCH(is_even_K, IsEvenKConst, [&] {
                if (params.num_splits <= 2) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 8, 1, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 4) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 8, 2, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 8) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 8, 3, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 16) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 8, 4, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 32) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 8, 5, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 64) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 8, 6, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 128) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 8, 7, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                }
                C10_CUDA_KERNEL_LAUNCH_CHECK();
            });
        }else{
            dim3 grid_combine((params.b * params.h * params.seqlen_q + 16 - 1) / 16);
            BOOL_SWITCH(is_even_K, IsEvenKConst, [&] {
                if (params.num_splits <= 2) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 16, 1, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 4) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 16, 2, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 8) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 16, 3, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 16) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 16, 4, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 32) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 16, 5, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 64) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 16, 6, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                } else if (params.num_splits <= 128) {
                    flash_fwd_splitkv_combine_kernel<Kernel_traits, 16, 7, IsEvenKConst><<<grid_combine, Kernel_traits::kNThreads, 0, stream>>>(params);
                }
                C10_CUDA_KERNEL_LAUNCH_CHECK();
            });
        }

Third, for the function"run_mha_fwd_splitkv_dispatch" in "flash_fwd_launch_template.h", line 194, you also have to change "kBlockM" in the template as 64. And then you can try to compile it. These solutions looked stupid but really solved my problem, I successfully compiled flash_attn2 on Windows, and I still need to take some time to test it on other computers. I put the files I rewrote: [link](https://drive.google.com/drive/folders/1n8MRQC0-KwWHLfcIzN-D_LqzutcbKUa?usp=sharing). I think there might be a better solution, but for me, it at least works. Oh, I didn't use Ninja and compiled it from source code, might someone can try to compile it with Ninja? EDIT: I used

python 3.11
Pytorch 2.2+cu121 Nightly
CUDA 12.2
Anaconda
Windows 11 22H2

Panchovix commented 11 months ago

I did try replacing you files .h files on my venv, with

Python 3.10
Pytorch 2.2 Nightly
CUDA 12.1
Visual Studio 2022
Ninja

And the build failed fairly quickly. I have uninstalled ninja but it seems to be importing it anyways? How did you make to not use ninja?

Also, I can't install your build since I'm on Python 3.10. Gonna see if I manage to compile it.

EDIT: Tried with CUDA 12.2, no luck either.

EDIT2: I managed to build it. I took your .h codes and uncommeneted the variable declarations, and then it worked. It took ~30 minutes on a 7800X3D and 64GB RAM.

It seems that for some reason Windows try to use/import those variables, even when not declared. But, at the same time, if used in some lines below, it doesn't work.

EDIT3: I can confirm it works for exllamav2 + FA v2

Without FA

-- Measuring token speed...
 ** Position     1 + 127 tokens:   13.5848 t/s
 ** Position   128 + 128 tokens:   13.8594 t/s
 ** Position   256 + 128 tokens:   14.1394 t/s
 ** Position   384 + 128 tokens:   13.8138 t/s
 ** Position   512 + 128 tokens:   13.4949 t/s
 ** Position   640 + 128 tokens:   13.6474 t/s
 ** Position   768 + 128 tokens:   13.7073 t/s
 ** Position   896 + 128 tokens:   12.3254 t/s
 ** Position  1024 + 128 tokens:   13.8960 t/s
 ** Position  1152 + 128 tokens:   13.7677 t/s
 ** Position  1280 + 128 tokens:   12.9869 t/s
 ** Position  1408 + 128 tokens:   12.1336 t/s
 ** Position  1536 + 128 tokens:   13.0463 t/s
 ** Position  1664 + 128 tokens:   13.2463 t/s
 ** Position  1792 + 128 tokens:   12.6211 t/s
 ** Position  1920 + 128 tokens:   13.1429 t/s
 ** Position  2048 + 128 tokens:   12.5674 t/s
 ** Position  2176 + 128 tokens:   12.5847 t/s
 ** Position  2304 + 128 tokens:   13.3471 t/s
 ** Position  2432 + 128 tokens:   12.9135 t/s
 ** Position  2560 + 128 tokens:   12.2195 t/s
 ** Position  2688 + 128 tokens:   11.6120 t/s
 ** Position  2816 + 128 tokens:   11.2545 t/s
 ** Position  2944 + 128 tokens:   11.5304 t/s
 ** Position  3072 + 128 tokens:   11.7982 t/s
 ** Position  3200 + 128 tokens:   11.8041 t/s
 ** Position  3328 + 128 tokens:   12.8038 t/s
 ** Position  3456 + 128 tokens:   12.7324 t/s
 ** Position  3584 + 128 tokens:   11.7733 t/s
 ** Position  3712 + 128 tokens:   10.7961 t/s
 ** Position  3840 + 128 tokens:   11.1014 t/s
 ** Position  3968 + 128 tokens:   10.8474 t/s

With FA

 -- Measuring token speed...
 ** Position     1 + 127 tokens:   22.6606 t/s
 ** Position   128 + 128 tokens:   22.5140 t/s
 ** Position   256 + 128 tokens:   22.6111 t/s
 ** Position   384 + 128 tokens:   22.6027 t/s
 ** Position   512 + 128 tokens:   22.3392 t/s
 ** Position   640 + 128 tokens:   22.0570 t/s
 ** Position   768 + 128 tokens:   22.3728 t/s
 ** Position   896 + 128 tokens:   22.4983 t/s
 ** Position  1024 + 128 tokens:   21.9384 t/s
 ** Position  1152 + 128 tokens:   22.3509 t/s
 ** Position  1280 + 128 tokens:   22.3189 t/s
 ** Position  1408 + 128 tokens:   22.2739 t/s
 ** Position  1536 + 128 tokens:   22.4145 t/s
 ** Position  1664 + 128 tokens:   21.9608 t/s
 ** Position  1792 + 128 tokens:   21.7645 t/s
 ** Position  1920 + 128 tokens:   22.1468 t/s
 ** Position  2048 + 128 tokens:   22.3400 t/s
 ** Position  2176 + 128 tokens:   21.9830 t/s
 ** Position  2304 + 128 tokens:   21.8387 t/s
 ** Position  2432 + 128 tokens:   20.2306 t/s
 ** Position  2560 + 128 tokens:   21.0056 t/s
 ** Position  2688 + 128 tokens:   22.2157 t/s
 ** Position  2816 + 128 tokens:   22.1912 t/s
 ** Position  2944 + 128 tokens:   22.1835 t/s
 ** Position  3072 + 128 tokens:   22.1393 t/s
 ** Position  3200 + 128 tokens:   22.1182 t/s
 ** Position  3328 + 128 tokens:   22.0821 t/s
 ** Position  3456 + 128 tokens:   22.0308 t/s
 ** Position  3584 + 128 tokens:   22.0060 t/s
 ** Position  3712 + 128 tokens:   21.9909 t/s
 ** Position  3840 + 128 tokens:   21.9816 t/s
 ** Position  3968 + 128 tokens:   21.9757 t/s

tridao commented 11 months ago

This is very helpful, thanks @Akatsuki030 and @Panchovix. @Akatsuki030 is it possible to fix it by declaring these variables (Headdim, kBlockM) with constexpr static int instead of constexpr int? I've just pushed a commit that does it. Can you check if that compile on Windows? A while back someone (I think it was Daniel Haziza from the xformers team) told me that they need constexpr static int for Windows compilation.

Panchovix commented 11 months ago

@tridao just tested the compilation with your latest push, and now it works.

I did use

Python 3.10
Pytorch 2.2+cu121 Nightly
CUDA 12.2
Visual Studio 2022
Ninja

tridao commented 11 months ago

Great, thanks for the confirmation @Panchovix. I'll cut a release now (v2.3.2). Ideally we'd set up prebuilt CUDA wheels for Windows at some point so folks can just download instead of having to compile locally, but that can wait till later.

Panchovix commented 11 months ago

Great, thanks for the confirmation @Panchovix. I'll cut a release now (v2.3.2). Ideally we'd set up prebuilt CUDA wheels for Windows at some point so folks can just download instead of having to compile locally, but that can wait till later.

Great! I did built a whl with python setup.py bdist_wheel but it seems some people have issues, but it is here in any case https://huggingface.co/Panchovix/flash-attn-2-windows-test-wheel. Probably a missing step for now.

Panchovix commented 11 months ago

@tridao based on some tests, it seems you need, at least CUDA 12.x and a torch version to build flash attn 2 on Windows, or to even use the wheel. CUDA 11.8 fails to build. Exllamav2 needs to be built with torch+cu121 as well.

We have to be aware that ooba webui comes by default with torch+cu118, so if Windows + that cuda version, it won't compile.

tridao commented 11 months ago

I see, thanks for the confirmation. I guess we rely on Cutlass and Cutlass requires CUDA 12.x to build on Windows.

bdashore3 commented 11 months ago

Just built on cuda 12.1 and tested with exllama_v2 on oobabooga's webui. And can confirm what @Panchovix said above, cuda 12.x is required for Cutlass (12.1 if you want pytorch v2.1).

https://github.com/bdashore3/flash-attention/releases/tag/2.3.2

bdashore3 commented 11 months ago

Another note, it may be a good idea to build wheels for cu121 as well, since github actions currently doesn't build for that version.

tridao commented 11 months ago

Another note, it may be a good idea to build wheels for cu121 as well, since github actions currently doesn't build for that version.

Right now github actions only build for Linux. We intentionally don't build with CUDA 12.1 (due to some segfault with nvcc) but when installing on CUDA 12.1, setup.py will download the wheel for 12.2 and use that (they're compatible).

If you (or anyone) have experience with setting up github actions for Windows I'd love to get help there.

dunbin commented 11 months ago

Great, thanks for the confirmation @Panchovix. I'll cut a release now (v2.3.2). Ideally we'd set up prebuilt CUDA wheels for Windows at some point so folks can just download instead of having to compile locally, but that can wait till later.

Great! I did built a whl with python setup.py bdist_wheel but it seems some people have issues, but it is here in any case https://huggingface.co/Panchovix/flash-attn-2-windows-test-wheel. Probably a missing step for now.

你真乃神人也！

mattiamazzari commented 11 months ago

Works like a charm. I used:

CUDA 12.2
PyTorch 2.2.0.dev20231011+cu121 (installed with the command pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121). Be sure you install this CUDA version and not the CPU version.

I have a CPU with 6 cores, so I set the environment variable MAX_JOBS to 4 (previously I've set it to 6 but I got an out-of-memory error), remember to restart your computer after you set it. It took 3h more or less to compile everything with 16GB of RAM.

If you get a "ninja: build stopped: subcommand failed" error, do this: git clean -xdf python setup.py clean git submodule sync git submodule deinit -f . git submodule update --init --recursive python setup.py install

YuehChuan commented 11 months ago

GOOD🎶 RTX4090 24GB RAM AMD7950X 64GM RAM python3.8 python3.10 both work

python3.10
https://www.python.org/downloads/release/python-3100/
win11

python -m venv venv

cd venc/Scripts
activate
-----------------------

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention

pip install packaging 
pip install wheel

set MAX_JOBS=4
python setup.py install

Nicoolodion2 commented 10 months ago

Hey, Got it build the wheels finally (on windows), but oobaboogas webui still doesn't detect it... It still gives me the message to install Flash-attention... Anyone got a solution?

bdashore3 commented 10 months ago

@Nicoolodion2 Use my PR until ooba merges it. FA2 on Windows requires Cuda 12.1 while ooba is still stuck on 11.8.

neocao123 commented 10 months ago

I'm trying using flash attention in modelscope-agent, which needs layer_norm and rotary.Now flash attention and rotary has been built by @bdashore3 's branch, while layer_norm in error.

I used py3.10, vs2019,cuda12.1

tridao commented 10 months ago

You don't have to use layer_norm.

neocao123 commented 10 months ago

You don't have to use layer_norm.

However, I made it work.

The trouble is in ln_bwd_kernels.cuh line 54

For some reason unknown, BOOL_SWITCH not worked as turning bool has_colscale to constrexpr bool HasColscaleConst,which caused error C2975.I just make it as

if(HasColscaleConst){
                        using Kernel_traits_f = layer_norm::Kernel_traits_finalize<HIDDEN_SIZE,
                                                                                  weight_t,
                                                                                  input_t,
                                                                                  residual_t,
                                                                                  output_t,
                                                                                  compute_t,
                                                                                  index_t,
                                                                                  true,
                                                                                  32 * 32,  // THREADS_PER_CTA
                                                                                  BYTES_PER_LDG_FINAL>;

                        auto kernel_f = &layer_norm::ln_bwd_finalize_kernel<Kernel_traits_f, HasColscaleConst, IsEvenColsConst>;
                        kernel_f<<<Kernel_traits_f::CTAS, Kernel_traits_f::THREADS_PER_CTA, 0, stream>>>(launch_params.params);
                    }else{
                        using Kernel_traits_f = layer_norm::Kernel_traits_finalize<HIDDEN_SIZE,
                                                                                  weight_t,
                                                                                  input_t,
                                                                                  residual_t,
                                                                                  output_t,
                                                                                  compute_t,
                                                                                  index_t,
                                                                                  false,
                                                                                  32 * 32,  // THREADS_PER_CTA
                                                                                  BYTES_PER_LDG_FINAL>;

                        auto kernel_f = &layer_norm::ln_bwd_finalize_kernel<Kernel_traits_f, HasColscaleConst, IsEvenColsConst>;
                        kernel_f<<<Kernel_traits_f::CTAS, Kernel_traits_f::THREADS_PER_CTA, 0, stream>>>(launch_params.params);

That's stupid way, but it works ,and now is compiling.

havietisov commented 8 months ago

Does it mean I can use FA2 on windows if build it from source?

dunbin commented 8 months ago

您好！信件已收到，感谢您的来信。

Piscabo commented 8 months ago

Any compiled wheel for Windows 11, Python 3.11 Cuda 12.2 Torch 2.1.2

_note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash_attn Running setup.py clean for flash_attn Failed to build flash_attn ERROR: Could not build wheels for flashattn, which is required to install pyproject.toml-based projects

dunbin commented 8 months ago

您好！信件已收到，感谢您的来信。

dicksondickson commented 3 months ago

GOOD🎶 RTX4090 24GB RAM AMD7950X 64GM RAM python3.8 python3.10 both work

python3.10
https://www.python.org/downloads/release/python-3100/
win11

python -m venv venv

cd venc/Scripts
activate
-----------------------

git clone https://github.com/Dao-AILab/flash-attention
cd flash-attention

pip install packaging 
pip install wheel

set MAX_JOBS=4
python setup.py install

Confirmed this method compiles on Windows 11 and working!

I have the following installed: Python 3.11.9, Pytorch 2.3, CUDA 12.3, VS Studio 2022

System specs: AMD 7950x, 4090

dunbin commented 3 months ago

您好！信件已收到，感谢您的来信。

C0D3-BR3AK3R commented 2 months ago

I am trying to install Flash Attention 2 on Windows 11, with Python 3.12.3, and here is my setup - RTX 3050 Laptop 16 GB RAM Core i7 12650H.

So I have setup MSVC Build Tools 2022, alongside MS VS Community 2022. Once I cloned the Flash Attention git repo, I ran python setup.py install and it gives error below -

running build_ext
D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\utils\cpp_extension.py:384: UserWarning: Error checking compiler version for cl: [WinError 2] The system cannot find the file specified
  warnings.warn(f'Error checking compiler version for {compiler}: {error}')
building 'flash_attn_2_cuda' extension
creating D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\build\temp.win-amd64-cpython-312
creating D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\build\temp.win-amd64-cpython-312\Release
creating D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\build\temp.win-amd64-cpython-312\Release\csrc
creating D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\build\temp.win-amd64-cpython-312\Release\csrc\flash_attn
creating D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\build\temp.win-amd64-cpython-312\Release\csrc\flash_attn\src      
Emitting ninja build file D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\build\temp.win-amd64-cpython-312\Release\build.ninja...
Compiling objects...
Using envvar MAX_JOBS (1) as the number of workers...
[1/49] cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\csrc\flash_attn" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\csrc\flash_attn\src" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\csrc\cutlass\include" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\include" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\include\torch\csrc\api\include" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\include\TH" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\include\THC" "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\include" -IC:\Python312\include -IC:\Python312\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" -c "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\csrc\flash_attn\flash_api.cpp" /Fo"D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\build\temp.win-amd64-cpython-312\Release\csrc/flash_attn/flash_api.obj" -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++17
FAILED: D:/Github/Deep-Learning-Basics/LLM Testing/MultiModalAI/flash-attention/build/temp.win-amd64-cpython-312/Release/csrc/flash_attn/flash_api.obj
cl /showIncludes /nologo /O2 /W3 /GL /DNDEBUG /MD /MD /wd4819 /wd4251 /wd4244 /wd4267 /wd4275 /wd4018 /wd4190 /wd4624 /wd4067 /wd4068 /EHsc "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\csrc\flash_attn" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\csrc\flash_attn\src" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\csrc\cutlass\include" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\include" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\include\torch\csrc\api\include" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\include\TH" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\include\THC" "-IC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include" "-ID:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\include" -IC:\Python312\include -IC:\Python312\Include "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include" "-IC:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\VS\include" -c "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\csrc\flash_attn\flash_api.cpp" /Fo"D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\build\temp.win-amd64-cpython-312\Release\csrc/flash_attn/flash_api.obj" -O3 -std=c++17 -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=flash_attn_2_cuda -D_GLIBCXX_USE_CXX11_ABI=0 /std:c++17
cl : Command line warning D9002 : ignoring unknown option '-O3'
cl : Command line warning D9002 : ignoring unknown option '-std=c++17'
C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include\cstddef(11): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\utils\cpp_extension.py", line 2107, in _run_ninja_build
    subprocess.run(
  File "C:\Python312\Lib\subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '1']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\flash-attention\setup.py", line 311, in <module>
    setup(
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\__init__.py", line 103, in setup
    return distutils.core.setup(**attrs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\core.py", line 184, in setup     
    return run_commands(dist)
           ^^^^^^^^^^^^^^^^^^
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\core.py", line 200, in run_commands
    dist.run_commands()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\dist.py", line 969, in run_commands
    self.run_command(cmd)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\dist.py", line 968, in run_command
    super().run_command(command)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\command\install.py", line 87, in run        
    self.do_egg_install()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\command\install.py", line 139, in do_egg_install
    self.run_command('bdist_egg')
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\dist.py", line 968, in run_command
    super().run_command(command)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\command\bdist_egg.py", line 167, in run     
    cmd = self.call_command('install_lib', warn_dir=0)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\command\bdist_egg.py", line 153, in call_command
    self.run_command(cmdname)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\dist.py", line 968, in run_command
    super().run_command(command)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\command\install_lib.py", line 11, in run    
    self.build()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\command\install_lib.py", line 110, in build
    self.run_command('build_ext')
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\cmd.py", line 316, in run_command
    self.distribution.run_command(command)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\dist.py", line 968, in run_command
    super().run_command(command)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\dist.py", line 988, in run_command
    cmd_obj.run()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\command\build_ext.py", line 91, in run      
    _build_ext.run(self)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 359, in run
    self.build_extensions()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\utils\cpp_extension.py", line 870, in build_extensions
    build_ext.build_extensions(self)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 479, in build_extensions
    self._build_extensions_serial()
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 505, in _build_extensions_serial
    self.build_extension(ext)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\command\build_ext.py", line 252, in build_extension
    _build_ext.build_extension(self, ext)
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\setuptools\_distutils\command\build_ext.py", line 560, in build_extension
    objects = self.compiler.compile(
              ^^^^^^^^^^^^^^^^^^^^^^
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\utils\cpp_extension.py", line 842, in win_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\utils\cpp_extension.py", line 1783, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "D:\Github\Deep-Learning-Basics\LLM Testing\MultiModalAI\Flash-env\Lib\site-packages\torch\utils\cpp_extension.py", line 2123, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

I'm pretty new to this, so was hoping if someone could point me in the right direction. Couldn't find anyway to fix my issue elsewhere online. Any help would be appreciated. Thanks!

dunbin commented 2 months ago

您好！信件已收到，感谢您的来信。

dicksondickson commented 2 months ago

Seems like you are missing Cuda Toolkit

Download it from Nvidia's website cuda

I recently recompiled mine with the following: Windows 11 Python 3.12.4 pyTorch Nightly 2.4.0.dev20240606+cu124 Cuda 12.5.0_555.85 Nvidia v555.99 Drivers

If you wan to use my batch file, its hosted here: batch file

C0D3-BR3AK3R commented 2 months ago

Seems like you are missing Cuda Toolkit

Download it from Nvidia's website cuda

I recently recompiled mine with the following: Windows 11 Python 3.12.4 pyTorch Nightly 2.4.0.dev20240606+cu124 Cuda 12.5.0_555.85 Nvidia v555.99 Drivers

If you wan to use my batch file, its hosted here: batch file

Oh sorry, I forgot to mention, I do have Cuda toolkit installed. Below is my nvcc -V

 nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:28:36_Pacific_Standard_Time_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0

And below is my nvidia-smi

nvidia-smi
Wed Jun 12 13:05:22 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 555.85                 Driver Version: 555.85         CUDA Version: 12.5     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                  Driver-Model | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 3050 ...  WDDM  |   00000000:01:00.0 Off |                  N/A |
| N/A   66C    P8              3W /   72W |      32MiB /   4096MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     26140    C+G   ...8bbwe\SnippingTool\SnippingTool.exe      N/A      |
+-----------------------------------------------------------------------------------------+

dicksondickson commented 2 months ago

"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include\cstddef(11): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory
ninja: build stopped: subcommand failed."

Have you tried installing Visual Studio 2022?

C0D3-BR3AK3R commented 2 months ago

"C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Tools\MSVC\14.40.33807\include\cstddef(11): fatal error C1083: Cannot open include file: 'stddef.h': No such file or directory
ninja: build stopped: subcommand failed."

Have you tried installing Visual Studio 2022?

Yes, I had installed Visual Studio 2022 along with the Build Tools 2022. But the issue seemed to be stemming from Visual Studio itself, since I managed to build Flash Attention 2 after modifying the Visual Studio Community 2022 installation and adding the Windows 11 SDK (available under Desktop Development with C++ >> Optional).

Thanks!

konan009 commented 2 months ago

Just sharing, I was able to build this repo on windows without the need for changes above with these settings :

Python 3.11
VS 2022 C++ (v14.38-17.9)
CUDA 12.2

d-kleine commented 2 months ago

Seems like CUDA 12.4 and 12.5 not yet supported?

fangyizhu commented 2 months ago

I was able to compile and build from the source repository on Windows 11 with:

CUDA 12.5 Python 3.12

I have a Visual Studio 2019 that came with Windows and I've never used it.

pip install never not worked for me.

abgulati commented 2 months ago

Successfully install on Windows 11 23H2 (OS Build 22631.3737) via pip install (took about an hours time, system specs at the end):

pip install flash-attn --no-build-isolation

Python 3.11.5 & PIP 24.1.1 CUDA 12.4 PyTorch installed via:

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124

PIP dependencies:

pip install wheel==0.43.0
pip install ninja==1.11.1
pip install packaging==23.2

System Specs:

Intel Core i9 13900KF Nvidia RTX 3090FE 32GB DDR5 5600MT/s (16x2)

d-kleine commented 2 months ago

took about an hours time

Windows roughly an 1 hour, Ubuntu (Linux) some seconds to a few minutes....

NovaYear commented 2 months ago

Successfully install on Windows 11 23H2 (OS Build 22631.3737) via pip install (took about an hours time, system specs at the end):
pip install flash-attn --no-build-isolation
Python 3.11.5 & PIP 24.1.1 CUDA 12.4 PyTorch installed via:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu124
PIP dependencies:
pip install wheel==0.43.0
pip install ninja==1.11.1
pip install packaging==23.2
System Specs:

Intel Core i9 13900KF Nvidia RTX 3090FE 32GB DDR5 5600MT/s (16x2)

Thanks for the information. I compiled it as you said and it was successful. I set MAX_JOBS=8 as the parameter, other parameters are the same as yours. compilation information: winver: w11 24h2 26100.836 ram: 32gb dd4 4000mhz cpu: 5700g gpu: rtx3090 24gb runing: 8 compiling thread cpu usage: ~70% ram usage: ~31gb time: ~50mins

dicksondickson commented 2 months ago

I've been installing flash attention on multiple system and made some batch files to clone and compile for convenience. You can get them here: https://github.com/dicksondickson/ComfyUI-Clean-Install

Julianvaldesv commented 1 month ago

I have tried all kind of things, but still cannot make the Flash Attention to compile on my windows laptop. This is my settings, I do not know if I have to upgrade CUDA to 12.x. Any advice? C:\Users\15023>nvcc -V nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

Python 3.10.8 Intel(R) Core(TM) i9-14900HX 2.20 GHz 64-bit operating system, x64-based processor Windows 11 Pro Nvidia RTX 4080 Package Version

ninja 1.11.1 numpy 1.26.4 packaging 23.2 pillow 10.4.0 pip 24.1.2 pyparsing 3.1.2 python-dateutil 2.9.0.post0 requests 2.32.3 safetensors 0.4.3 setuptools 70.2.0 tokenizers 0.19.1 torch 2.3.1+cu118 torchaudio 2.3.1+cu118 torchvision 0.18.1+cu118 tqdm 4.66.4 urllib3 2.2.2 wheel 0.43.0

Boubou78000 commented 1 month ago

I ran

set MAX_JOBS=4

And restarted my computer. Then I ran the pip command and it worked

jhj0517 commented 1 month ago

set MAX_JOBS=1
pip install flash-attn

It worked, but it took hours to install on Windows. ( Stuck at "Building wheel for flash-attn (setup.py)...", building wheel was super slow )

Julianvaldesv commented 1 month ago

It does not work in my case, :( PC Specs: Intel(R) Core(TM) i9-14900HX 2.20 GHz 64-bit operating system, x64-based processor Windows 11 Pro Nvidia RTX 4080

Settings: nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:41:10_Pacific_Daylight_Time_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0

Package Version

python 3.10.8 ninja 1.11.1 numpy 1.26.4 packaging 23.2 pillow 10.4.0 pip 24.1.2 pyparsing 3.1.2 python-dateutil 2.9.0.post0 requests 2.32.3 setuptools 70.2.0 tokenizers 0.19.1 torch 2.3.1+cu118 torchaudio 2.3.1+cu118 torchvision 0.18.1+cu118 tqdm 4.66.4 urllib3 2.2.2 wheel 0.43.0

VSINSTALLDIR=C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\

Commands : set MAX_JOBS=1 pip install flash-attn --no-build-isolation

Errors:

Building wheels for collected packages: flash-attn Building wheel for flash-attn (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [271 lines of output] fatal: not a git repository (or any of the parent directories): .git

  torch.__version__  = 2.3.1+cu118

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\crt/host_config.h(153): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

FAILED: C:/Users/15023/AppData/Local/Temp/pip-install-dfkun1cn/flash-attn_b24e1ea8cfd04a7980b436f7faaf577f/build/temp.win-amd64-cpython-310/Release/csrc/flash_attn/src/flash_bwd_hdim160_bf16_sm80.obj

RuntimeError: Error compiling objects for extension [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash-attn Running setup.py clean for flash-attn Failed to build flash-attn ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (flash-attn)

abgulati commented 1 month ago

@Julianvaldesv

The key line is: fatal: not a git repository (or any of the parent directories): .git

This occurs because the setup.py script for flash-attention is trying to run a Git command to update submodules.

Clone the flash-attn git repo and run the pip install command from within it. If you encounter errors stating no flash-attn or something, try running pip install . --no-build-isolation

Julianvaldesv commented 1 month ago

pip install . --no-build-isolation

I did that before, no good results . I am not sure if I need to upgrade the CUDA from 11.8 to 12.4. Run from git repo:

PS C:\Users\15023\Documents\Models\Tiny> cd flash-attention

set MAX_JOBS=4

PS C:\Users\15023\Documents\Models\Tiny\flash-attention> pip install . --no-build-isolation Processing c:\users\15023\documents\models\tiny\flash-attention Preparing metadata (setup.py) ... done PS C:\Users\15023\Documents\Models\Tiny> cd flash-attention set MAX_JOBS=4

PS C:\Users\15023\Documents\Models\Tiny\flash-attention> pip install . --no-build-isolation Processing c:\users\15023\documents\models\tiny\flash-attention Preparing metadata (setup.py) ... done Building wheels for collected packages: flash_attn Building wheel for flash_attn (setup.py) ... error error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully. │ exit code: 1 ╰─> [274 lines of output]

  torch.__version__  = 2.3.1+cu118

  C:\Users\15023\Documents\Models\Tiny\.venv\lib\site-packages\setuptools\__init__.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
  !!

          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************

  !!
    dist.fetch_build_eggs(dist.setup_requires)
  running bdist_wheel
  Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.9.post1/flash_attn-2.5.9.post1+cu118torch2.3cxx11abiFALSE-cp310-cp310-win_amd64.whl

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8\include\crt/host_config.h(153): fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

File "C:\Users\15023\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 643, in http_error_default raise HTTPError(req.full_url, code, msg, hdrs, fp) urllib.error.HTTPError: HTTP Error 404: Not Found

RuntimeError: Error compiling objects for extension [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for flash_attn Running setup.py clean for flash_attn Failed to build flash_attn ERROR: ERROR: Failed to build installable wheels for some pyproject.toml based projects (flash_attn)

abgulati commented 1 month ago

@Julianvaldesv mate you need to start reading those error messages!

The git issue has been resolved and the error has changed so there's progress. It's screaming at you to upgrade PIP:

********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************

It's even giving you the command to use there and if that doesn't work, simply Google how to upgrade PIP!

It's also telling you your version of MSVS is unsupported: fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

Upgrade pip, then refer to the instructions in my repo to install VisualStudio Build Tools and try again: https://github.com/abgulati/LARS?tab=readme-ov-file#1-build-tools

Julianvaldesv commented 1 month ago

@Julianvaldesv mate you need to start reading those error messages!

The git issue has been resolved and the error has changed so there's progress. It's screaming at you to upgrade PIP:
********************************************************************************
Requirements should be satisfied by a PEP 517 installer.
If you are using pip, you can try `pip install --use-pep517`.
********************************************************************************
It's even giving you the command to use there and if that doesn't work, simply Google how to upgrade PIP!

It's also telling you your version of MSVS is unsupported: fatal error C1189: #error: -- unsupported Microsoft Visual Studio version! Only the versions between 2017 and 2022 (inclusive) are supported! The nvcc flag '-allow-unsupported-compiler' can be used to override this version check; however, using an unsupported host compiler may cause compilation failure or incorrect run time execution. Use at your own risk.

Upgrade pip, then refer to the instructions in my repo to install VisualStudio Build Tools and try again: https://github.com/abgulati/LARS?tab=readme-ov-file#1-build-tools

@abgulati my friend, thanks for your help. Something else is going on. I upgraded PIP days ago.

PS C:\Users\15023\Documents\Models\Tiny\flash-attention> python -m pip install --upgrade pip

Requirement already satisfied: pip in c:\users\15023\documents\models\tiny.venv\lib\site-packages (24.1.2)

Also I have installed the VisualStudio Build Tools 2022. C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Current\Bin\MSBuild

abgulati commented 1 month ago

@Julianvaldesv In that case, try pasting this error in GPT-4/o or any other good LLM you have access to, describe the problem and background and see what it says

dicksondickson commented 1 month ago

@Julianvaldesv You are upgrading pip in that tiny.venv. Seems like your system is a mess. Much easier and faster to nuke your system from orbit and start from scratch. Sometimes that's the only way.

Julianvaldesv commented 1 month ago

I was able to compile and build from the source repository on Windows 11 with:

CUDA 12.5 Python 3.12

I have a Visual Studio 2019 that came with Windows and I've never used it.

pip install never not worked for me.

What Torch version did you install that it's compatible with CUDA 12.5? According to Pytorch site, only 12.1 is fully supported (or 12.4 from source).

i486 commented 1 month ago

Looks like oobabooga has Windows wheels for cu122, but sadly, no CU118 wheels.

https://github.com/oobabooga/flash-attention/releases/download/v2.6.1/flash_attn-2.6.1+cu122torch2.2.2cxx11abiFALSE-cp311-cp311-win_amd64.whl; platform_system == "Windows" and python_version == "3.11"

https://github.com/oobabooga/flash-attention/releases/download/v2.6.1/flash_attn-2.6.1+cu122torch2.2.2cxx11abiFALSE-cp310-cp310-win_amd64.whl; platform_system == "Windows" and python_version == "3.10")

pwillia7 commented 2 weeks ago

If pip isn't working for you, you may need more RAM. I was not able to compile in any way on 16GB of RAM, pip worked fine after upgrading to 64GB -- Took a few hours.

Dao-AILab / flash-attention

Might be a solution to get built/compiles Flash Attention 2 on Windows #595