Open Nero10578 opened 1 month ago
Looks like installing flash-attn with our torch version doesn't work:
ImportError: /home/anon/miniconda3/envs/aphrodite/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
I'll look into it. Thanks for reporting.
I have flash attention installed and compiled it from source to support new torch but it still says it isn't found. Will double check it.
I recompiled it again after deleting build and dist. Sadly doesn't work on 3 GPUs and 5bit 70b won't fit on 2 despite fitting in textgen.
Looks like installing flash-attn with our torch version doesn't work:
ImportError: /home/anon/miniconda3/envs/aphrodite/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
I'll look into it. Thanks for reporting.
It seems to work in the new commit now
I can use it and it works, but its slightly slower, 9tok/s activated, 11.5 tok/s deactivated, inference on Llama3-70B-8bpw, 4x3090 gpu.
I thought VLLM supported a triton based FA for all (tensor) cards, I was hoping to try it here but instead it used the normal FA package.
Looks like installing flash-attn with our torch version doesn't work:
ImportError: /home/anon/miniconda3/envs/aphrodite/lib/python3.11/site-packages/flash_attn_2_cuda.cpython-311-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
I'll look into it. Thanks for reporting.
It seems to work in the new commit now
It actually stopped working again now when i try to reinstall on the latest commit. Not sure why it worked previously once.
same here
Your current environment
🐛 Describe the bug
I just git cloned fresh then ran ./update-runtime.sh. Then installed flash-attn with ./runtime pip install flash-attn.
Results in aphrodite not using flash-attention still even though flash-attn is installed already.