Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
14.49k stars 1.36k forks source link

Flash Attention 2 Error -> undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE #836

Open yekta opened 9 months ago

yekta commented 9 months ago

I'm using nvcr.io/nvidia/pytorch:23.10-py3 and https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl. Getting the error below:

ImportError: /app/venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE

Issue 451 might be related to this but error seems different. Are there any solutions to this? Tried 5 different combinations already.

tridao commented 9 months ago

You can try nvcr 23:12 with flash-attn 2.5.1. Or compile from source.

yekta commented 9 months ago

You can try nvcr 23:12 with flash-attn 2.5.1. Or compile from source.

nvcr 23:12 is CUDA 12.3.0 and flash-attn 2.5.1 release doesn't have a 12.3 version. Does that supposed to work still?

tridao commented 9 months ago

Try it.

yekta commented 9 months ago

Try it.

I'm wondering about the reason for your suggestion if it wasn't clear.

tridao commented 9 months ago

nvcr 23.12 uses pytorch nightly 2.2.0.dev20231106. flash-attn compiled wheels with pytorch nightly up to version 2.5.1, after that pytorch 2.2.0 official was released and we compiled wheels with pytorch 2.2.0. The two wheels are not compatible.

yekta commented 9 months ago

nvcr 23.12 uses pytorch nightly 2.2.0.dev20231106. flash-attn compiled wheels with pytorch nightly up to version 2.5.1, after that pytorch 2.2.0 official was released and we compiled wheels with pytorch 2.2.0. The two wheels are not compatible.

I'm trying it now (will report back when it's done in about 30 min).

The part I don't understand is that this says torch 2.1 not 2.2: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

So what is incompatible between nvcr.io/nvidia/pytorch:23.10-py3 and that wheel? If wheels are compiled with pytorch 2.2.0 after 2.5.1, what is the torch2.1 part mean in the release above?

tridao commented 9 months ago

Idk pytorch / cuda compatibility is messy. nvcr pytorch 23.10 uses pytorch 2.1.0a0+32f93b1. I think our wheels are compile with official pytorch 2.1.0. The two wheels might not be compatible.

yekta commented 9 months ago

Try it.

Getting a different error this time: ImportError: /app/venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

I'm guessing because of this:

"nvcr 23:12 is CUDA 12.3.0 and flash-attn 2.5.1 release doesn't have a 12.3 version". I used the 12.2 version of 2.5.1 flash attention instead since there is no release for 12.2.

tridao commented 9 months ago

12.3 and 12.2 should be compatible. I've just tried nvcr pytorch 23.12 and it works fine

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1
ipython
In [1]: import torch

In [2]: from flash_attn import flash_attn_func

In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2)

In [4]: out = flash_attn_func(q, k, v)
yekta commented 9 months ago

12.3 and 12.2 should be compatible. I've just tried nvcr pytorch 23.12 and it works fine

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1
ipython
In [1]: import torch

In [2]: from flash_attn import flash_attn_func

In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2)

In [4]: out = flash_attn_func(q, k, v)

The error I'm getting is from 2.5.1 not 2.5.1.post since your message said 2.5.1. I'll try that now. I'm not doing pip install btw, I'm installing using the url directly. In this case I'll be using if this is your suggestion: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1.post1/flash_attn-2.5.1.post1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

This is the one I tried previously: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1/flash_attn-2.5.1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

tridao commented 9 months ago

Why do you use the url directly instead of pip? pip will run setup.py to choose the correct wheel. In this case you want the wheel to have abiTRUE not abiFALSE.

yekta commented 9 months ago

Why do you use the url directly instead of pip? pip will run setup.py to choose the correct wheel. In this case you want the wheel to have abiTRUE not abiFALSE.

The CI/CD worker that creates the build and the worker that runs the the build is different. We don't want to dedicate GPU workers or install drivers just to build each version.

yekta commented 9 months ago

12.3 and 12.2 should be compatible. I've just tried nvcr pytorch 23.12 and it works fine

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1
ipython
In [1]: import torch

In [2]: from flash_attn import flash_attn_func

In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2)

In [4]: out = flash_attn_func(q, k, v)

The error I'm getting is from 2.5.1 not 2.5.1.post since your message said 2.5.1. I'll try that now. I'm not doing pip install btw, I'm installing using the url directly. In this case I'll be using if this is your suggestion: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1.post1/flash_attn-2.5.1.post1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

This is the one I tried previously: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1/flash_attn-2.5.1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Got the same error with this. Now trying abiTRUE.

yekta commented 9 months ago

12.3 and 12.2 should be compatible. I've just tried nvcr pytorch 23.12 and it works fine

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1
ipython
In [1]: import torch

In [2]: from flash_attn import flash_attn_func

In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2)

In [4]: out = flash_attn_func(q, k, v)

The error I'm getting is from 2.5.1 not 2.5.1.post since your message said 2.5.1. I'll try that now. I'm not doing pip install btw, I'm installing using the url directly. In this case I'll be using if this is your suggestion: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1.post1/flash_attn-2.5.1.post1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl This is the one I tried previously: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1/flash_attn-2.5.1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl

Got the same error with this. Now trying abiTRUE.

That is also failing with a different error:

ImportError: /app/venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE

tridao commented 9 months ago

Try following this?

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1
yekta commented 9 months ago

Try following this?

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1

As I said the CI/CD worker that creates our build and the workers that run the the build are different. We need the final image to include everything our GPU workers need and the CPU worker to be able to build that container regardless of its spec. This is why we are using prebuilt wheels. pip installing flash-attn directly in the CI/CD pipeline while building the final Docker image wasn't working (because the script tries to pick a version on the CI/CD worker which isn't compatible with the actual GPU worker).

Our process was working just fine for 6+ months across multiple CUDA, PyTorch and flash-attention versions using the exact method I described above. We just looked at the prebuilt flash attention wheel version (CUDA and PyTorch), used our custom CUDA + PyTorch image and that was all. Now even though prebuilt flash-attention's versions match both our custom CUDA + PyTorch image and nvcr, neither works (I tried 10 different combinations at this point).

yekta commented 9 months ago

We've installed the package on runtime inside our GPU worker instead. For some versions it built a wheel from scratch, for some it picked one (I'm not sure exactly which one it picked). However, they both do work.

Avelina9X commented 9 months ago

I've tested on a fresh machine with an A100, and the following combination works for a minimal docker install: container: nvcr.io/nvidia/pytorch:24.01-py3 pip install: pip install flash-attn==2.5.1.post1 --no-build-isolation --upgrade

I've had issues with 23.12-py3 but 24.01-py3 works perfectly.

yekta commented 9 months ago

I've tested on a fresh machine with an A100, and the following combination works for a minimal docker install: container: nvcr.io/nvidia/pytorch:24.01-py3 pip install: pip install flash-attn==2.5.1.post1 --no-build-isolation --upgrade

I've had issues with 23.12-py3 but 24.01-py3 works perfectly.

Our method was much weirder. We made it so that the package is built or downloaded on runtime. 2.2.2 built a custom wheel which took quite a long time but 2.5.3 happened almost instantly so we're guessing it just matched a prebuilt wheel. We pulled that wheel out of a pod inside our k8s cluster and uploaded somewhere (since we can't couldn't find its exact config). Now using that URL instead of install on runtime.

samblouir commented 9 months ago

Try following this?

docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1

This works for me, but only with "nvcr.io/nvidia/pytorch:24.01-py3" instead of "nvcr.io/nvidia/pytorch:23.12-py3", as suggested by @Avelina9X in https://github.com/Dao-AILab/flash-attention/issues/836#issuecomment-1954569339

I think it had originally stopped working for me after a GPU driver update. nvidia-smi reports: Driver Version: 545.23.08 and CUDA Version: 12.3

Works great now. Thanks!

neggles commented 9 months ago

Why CUDA 12.2 when mainline/release PyTorch is still build for/using CUDA 12.1 by default? This is likely the cause of most of these issues/complaints since if you follow Torch's "get started" guide on a Linux machine you'll end up with CUDA 12.1 libs in a venv.

JohnnyRacer commented 9 months ago

I've tested on a fresh machine with an A100, and the following combination works for a minimal docker install: container: nvcr.io/nvidia/pytorch:24.01-py3 pip install: pip install flash-attn==2.5.1.post1 --no-build-isolation --upgrade

I've had issues with 23.12-py3 but 24.01-py3 works perfectly.

I've tested 23.12-py3, it works with 2.5.1.post1. 24.01-py3 and 24.02-py3 with 2.5.1.post1 does not and gives the undefined symbol error. Also any version above 2.5.1.post1 gives the same undefined symbol error regardless of the three docker images I have tested. This is from pip only, I have not tried to install and compile from source. What GPU are you using for testing @Avelina9X ?

ArturFormella commented 9 months ago

I have similar problem in text-generation-webui

pip install flash-attn==2.5.6
./start_linux.sh --listen --api --listen-host 0.0.0.0 --listen-port 7862 --model-dir /T8/gpt_models --trust-remote-code
Traceback (most recent call last):
  File "/ai/text-generation-webui/modules/ui_model_menu.py", line 243, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(selected_model, loader)
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/modules/models.py", line 87, in load_model
    output = load_func_map[loader](model_name)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/modules/models.py", line 235, in huggingface_loader
    model = LoaderClass.from_pretrained(path_to_model, **params)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
    model_class = _get_model_class(config, cls._model_mapping)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 381, in _get_model_class
    supported_models = model_mapping[type(config)]
                       ~~~~~~~~~~~~~^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 732, in __getitem__
    return self._load_attr_from_module(model_type, model_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 746, in _load_attr_from_module
    return getattribute_from_module(self._modules[module_name], attr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 690, in getattribute_from_module
    if hasattr(module, attr):
       ^^^^^^^^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1380, in __getattr__
    module = self._get_module(self._class_to_module[name])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1392, in _get_module
    raise RuntimeError(
RuntimeError: Failed to import transformers.models.mistral.modeling_mistral because of the following error (look up to see its traceback):
/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE
python3 --version
Python 3.11.4

Full pip freeze: pip_freeze.txt

patrick-tssn commented 8 months ago

I encounter the same issue when using Torch version 2.2 or higher; the error persists.

songkq commented 8 months ago

The same error with flash-attn-2.5.0. Solved by upgrading to flash-attn==2.5.6 with torch==2.2.1

freckletonj commented 6 months ago

I have this same missing symbol using:

torch==2.3.1
flash-attn==2.5.9.post1

Cuda 12.2

edit:

This resolved it for me finally

pip uninstall flash-attn
git clone https://github.com/Dao-AILab/flash-attention
pip install -e flash-attention