Open yekta opened 9 months ago
You can try nvcr 23:12 with flash-attn 2.5.1. Or compile from source.
You can try nvcr 23:12 with flash-attn 2.5.1. Or compile from source.
nvcr 23:12 is CUDA 12.3.0
and flash-attn 2.5.1 release doesn't have a 12.3 version. Does that supposed to work still?
Try it.
Try it.
I'm wondering about the reason for your suggestion if it wasn't clear.
nvcr 23.12 uses pytorch nightly 2.2.0.dev20231106. flash-attn compiled wheels with pytorch nightly up to version 2.5.1, after that pytorch 2.2.0 official was released and we compiled wheels with pytorch 2.2.0. The two wheels are not compatible.
nvcr 23.12 uses pytorch nightly 2.2.0.dev20231106. flash-attn compiled wheels with pytorch nightly up to version 2.5.1, after that pytorch 2.2.0 official was released and we compiled wheels with pytorch 2.2.0. The two wheels are not compatible.
I'm trying it now (will report back when it's done in about 30 min).
The part I don't understand is that this says torch 2.1 not 2.2: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
So what is incompatible between nvcr.io/nvidia/pytorch:23.10-py3
and that wheel? If wheels are compiled with pytorch 2.2.0 after 2.5.1, what is the torch2.1
part mean in the release above?
Idk pytorch / cuda compatibility is messy. nvcr pytorch 23.10 uses pytorch 2.1.0a0+32f93b1. I think our wheels are compile with official pytorch 2.1.0. The two wheels might not be compatible.
Try it.
Getting a different error this time: ImportError: /app/venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
I'm guessing because of this:
"nvcr 23:12 is CUDA 12.3.0 and flash-attn 2.5.1 release doesn't have a 12.3 version". I used the 12.2 version of 2.5.1 flash attention instead since there is no release for 12.2.
12.3 and 12.2 should be compatible. I've just tried nvcr pytorch 23.12 and it works fine
docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1
ipython
In [1]: import torch
In [2]: from flash_attn import flash_attn_func
In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2)
In [4]: out = flash_attn_func(q, k, v)
12.3 and 12.2 should be compatible. I've just tried nvcr pytorch 23.12 and it works fine
docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3 pip install flash-attn==2.5.1.post1 ipython In [1]: import torch In [2]: from flash_attn import flash_attn_func In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2) In [4]: out = flash_attn_func(q, k, v)
The error I'm getting is from 2.5.1
not 2.5.1.post
since your message said 2.5.1
. I'll try that now. I'm not doing pip install btw, I'm installing using the url directly. In this case I'll be using if this is your suggestion: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1.post1/flash_attn-2.5.1.post1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
This is the one I tried previously: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1/flash_attn-2.5.1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
Why do you use the url directly instead of pip?
pip will run setup.py to choose the correct wheel. In this case you want the wheel to have abiTRUE
not abiFALSE
.
Why do you use the url directly instead of pip? pip will run setup.py to choose the correct wheel. In this case you want the wheel to have
abiTRUE
notabiFALSE
.
The CI/CD worker that creates the build and the worker that runs the the build is different. We don't want to dedicate GPU workers or install drivers just to build each version.
12.3 and 12.2 should be compatible. I've just tried nvcr pytorch 23.12 and it works fine
docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3 pip install flash-attn==2.5.1.post1 ipython In [1]: import torch In [2]: from flash_attn import flash_attn_func In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2) In [4]: out = flash_attn_func(q, k, v)
The error I'm getting is from
2.5.1
not2.5.1.post
since your message said2.5.1
. I'll try that now. I'm not doing pip install btw, I'm installing using the url directly. In this case I'll be using if this is your suggestion: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1.post1/flash_attn-2.5.1.post1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whlThis is the one I tried previously: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1/flash_attn-2.5.1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
Got the same error with this. Now trying abiTRUE.
12.3 and 12.2 should be compatible. I've just tried nvcr pytorch 23.12 and it works fine
docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3 pip install flash-attn==2.5.1.post1 ipython In [1]: import torch In [2]: from flash_attn import flash_attn_func In [3]: q, k, v = torch.randn(1, 128, 3, 16, 64, dtype=torch.float16, device='cuda').unbind(2) In [4]: out = flash_attn_func(q, k, v)
The error I'm getting is from
2.5.1
not2.5.1.post
since your message said2.5.1
. I'll try that now. I'm not doing pip install btw, I'm installing using the url directly. In this case I'll be using if this is your suggestion: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1.post1/flash_attn-2.5.1.post1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whl This is the one I tried previously: https://github.com/Dao-AILab/flash-attention/releases/download/v2.5.1/flash_attn-2.5.1+cu122torch2.2cxx11abiFALSE-cp310-cp310-linux_x86_64.whlGot the same error with this. Now trying abiTRUE.
That is also failing with a different error:
ImportError: /app/venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops5zeros4callEN3c108ArrayRefINS2_6SymIntEEENS2_8optionalINS2_10ScalarTypeEEENS6_INS2_6LayoutEEENS6_INS2_6DeviceEEENS6_IbEE
Try following this?
docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3
pip install flash-attn==2.5.1.post1
Try following this?
docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3 pip install flash-attn==2.5.1.post1
As I said the CI/CD worker that creates our build and the workers that run the the build are different. We need the final image to include everything our GPU workers need and the CPU worker to be able to build that container regardless of its spec. This is why we are using prebuilt wheels. pip installing flash-attn directly in the CI/CD pipeline while building the final Docker image wasn't working (because the script tries to pick a version on the CI/CD worker which isn't compatible with the actual GPU worker).
Our process was working just fine for 6+ months across multiple CUDA, PyTorch and flash-attention versions using the exact method I described above. We just looked at the prebuilt flash attention wheel version (CUDA and PyTorch), used our custom CUDA + PyTorch image and that was all. Now even though prebuilt flash-attention's versions match both our custom CUDA + PyTorch image and nvcr, neither works (I tried 10 different combinations at this point).
We've installed the package on runtime inside our GPU worker instead. For some versions it built a wheel from scratch, for some it picked one (I'm not sure exactly which one it picked). However, they both do work.
I've tested on a fresh machine with an A100, and the following combination works for a minimal docker install:
container: nvcr.io/nvidia/pytorch:24.01-py3
pip install: pip install flash-attn==2.5.1.post1 --no-build-isolation --upgrade
I've had issues with 23.12-py3
but 24.01-py3
works perfectly.
I've tested on a fresh machine with an A100, and the following combination works for a minimal docker install: container:
nvcr.io/nvidia/pytorch:24.01-py3
pip install:pip install flash-attn==2.5.1.post1 --no-build-isolation --upgrade
I've had issues with
23.12-py3
but24.01-py3
works perfectly.
Our method was much weirder. We made it so that the package is built or downloaded on runtime. 2.2.2 built a custom wheel which took quite a long time but 2.5.3 happened almost instantly so we're guessing it just matched a prebuilt wheel. We pulled that wheel out of a pod inside our k8s cluster and uploaded somewhere (since we can't couldn't find its exact config). Now using that URL instead of install on runtime.
Try following this?
docker run --rm -it --gpus all --network="host" --shm-size=900gb nvcr.io/nvidia/pytorch:23.12-py3 pip install flash-attn==2.5.1.post1
This works for me, but only with "nvcr.io/nvidia/pytorch:24.01-py3" instead of "nvcr.io/nvidia/pytorch:23.12-py3", as suggested by @Avelina9X in https://github.com/Dao-AILab/flash-attention/issues/836#issuecomment-1954569339
I think it had originally stopped working for me after a GPU driver update. nvidia-smi reports: Driver Version: 545.23.08 and CUDA Version: 12.3
Works great now. Thanks!
Why CUDA 12.2 when mainline/release PyTorch is still build for/using CUDA 12.1 by default? This is likely the cause of most of these issues/complaints since if you follow Torch's "get started" guide on a Linux machine you'll end up with CUDA 12.1 libs in a venv.
I've tested on a fresh machine with an A100, and the following combination works for a minimal docker install: container:
nvcr.io/nvidia/pytorch:24.01-py3
pip install:pip install flash-attn==2.5.1.post1 --no-build-isolation --upgrade
I've had issues with
23.12-py3
but24.01-py3
works perfectly.
I've tested 23.12-py3
, it works with 2.5.1.post1
. 24.01-py3
and 24.02-py3
with 2.5.1.post1
does not and gives the undefined symbol error. Also any version above 2.5.1.post1
gives the same undefined symbol error regardless of the three docker images I have tested. This is from pip
only, I have not tried to install and compile from source. What GPU are you using for testing @Avelina9X ?
I have similar problem in text-generation-webui
pip install flash-attn==2.5.6
./start_linux.sh --listen --api --listen-host 0.0.0.0 --listen-port 7862 --model-dir /T8/gpt_models --trust-remote-code
Traceback (most recent call last):
File "/ai/text-generation-webui/modules/ui_model_menu.py", line 243, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ai/text-generation-webui/modules/models.py", line 87, in load_model
output = load_func_map[loader](model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ai/text-generation-webui/modules/models.py", line 235, in huggingface_loader
model = LoaderClass.from_pretrained(path_to_model, **params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 560, in from_pretrained
model_class = _get_model_class(config, cls._model_mapping)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 381, in _get_model_class
supported_models = model_mapping[type(config)]
~~~~~~~~~~~~~^^^^^^^^^^^^^^
File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 732, in __getitem__
return self._load_attr_from_module(model_type, model_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 746, in _load_attr_from_module
return getattribute_from_module(self._modules[module_name], attr)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 690, in getattribute_from_module
if hasattr(module, attr):
^^^^^^^^^^^^^^^^^^^^^
File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1380, in __getattr__
module = self._get_module(self._class_to_module[name])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/transformers/utils/import_utils.py", line 1392, in _get_module
raise RuntimeError(
RuntimeError: Failed to import transformers.models.mistral.modeling_mistral because of the following error (look up to see its traceback):
/ai/text-generation-webui/installer_files/env/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE
python3 --version
Python 3.11.4
Full pip freeze: pip_freeze.txt
I encounter the same issue when using Torch version 2.2 or higher; the error persists.
The same error with flash-attn-2.5.0. Solved by upgrading to flash-attn==2.5.6 with torch==2.2.1
I have this same missing symbol using:
torch==2.3.1
flash-attn==2.5.9.post1
Cuda 12.2
edit:
This resolved it for me finally
pip uninstall flash-attn
git clone https://github.com/Dao-AILab/flash-attention
pip install -e flash-attention
I'm using
nvcr.io/nvidia/pytorch:23.10-py3
andhttps://github.com/Dao-AILab/flash-attention/releases/download/v2.5.3/flash_attn-2.5.3+cu122torch2.1cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
. Getting the error below:ImportError: /app/venv/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN2at4_ops9_pad_enum4callERKNS_6TensorEN3c108ArrayRefINS5_6SymIntEEElNS5_8optionalIdEE
Issue 451 might be related to this but error seems different. Are there any solutions to this? Tried 5 different combinations already.