Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
14.26k stars 1.33k forks source link

Runtime error from transformers #1250

Open HarryK4673 opened 1 month ago

HarryK4673 commented 1 month ago

Hello everyone!

I'm currently working on an assignment from uni and I need to fine-tuning a model. However, the model use this library to do that. When I run the fine-tuning script, there's an error like this:

RuntimeError: Failed to import transformers.models.llama.modeling_llama because of the following error (look up to see its traceback):
/home/ubuntu/anaconda3/envs/env/lib/python3.10/site-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104impl3cow11cow_deleterEPv

I'm using a server with A10G GPU. The driver version is 535.183.01 and CUDA version is 12.2, Pytorch version 2.1.2 with cu121. It seems I cannot install CUDA 12.1 on this driver (it needs driver 530, but I cannot install in on the server). Could anyone help me with this? Thanks a lot!

Nyandwi commented 1 month ago

I had this error lately and what worked for me was uninstalling flash_attn and reinstalling it again but without using cached version. I did also verify the transformers version the LLaMA was built around(4.45.0 for new llama models), but flash attention and mismatch in cuda and installed PyTorch seems to the issue there.

pip uninstall flash-attn
pip install flash-attn --no-cache-dir
HarryK4673 commented 1 month ago

pip install flash-attn --no-cache-dir

Thanks. It works now.