Dao-AILab / flash-attention

Fast and memory-efficient exact attention
BSD 3-Clause "New" or "Revised" License
13.71k stars 1.26k forks source link

谁成功在jetson上使用了 flash_attn #986

Open cthulhu-tww opened 4 months ago

cthulhu-tww commented 4 months ago

我刚才在我jetson上成功编译了这个库,但是运行llm推理的时候,报错 RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

jetpack:6 cuda:12.2 python:3.10 torch:用的英伟达的轮子torch-2.3.0a0+6ddf5cf85e.nv24.04.14026654-cp310-cp310-linux_aarch64.whl

mikeshi80 commented 3 months ago

Yes, I did. If you still want to know solution, you can leave a message. Maybe I will write a blog sometime later.

Gear-dev-sudo commented 2 months ago

Yes, I did. If you still want to know solution, you can leave a message. Maybe I will write a blog sometime later.

Yes, please, I ran into the same issue

mikeshi80 commented 2 months ago

Yes, I did. If you still want to know solution, you can leave a message. Maybe I will write a blog sometime later.

Yes, please, I ran into the same issue

OK, I will do it ASAP, however, since it is written in Chinese, maybe you need use translator to read it.

PS: OK, I read your bio. Obviously, you can read Chinese

Gear-dev-sudo commented 2 months ago

Yes, I did. If you still want to know solution, you can leave a message. Maybe I will write a blog sometime later.

Yes, please, I ran into the same issue

OK, I will do it ASAP, however, since it is written in Chinese, maybe you need use translator to read it.

PS: OK, I read your bio. Obviously, you can read Chinese

Thank you, I am Chinese myself haha, I'll try to translate your blog into English for others to reference

thohemp commented 2 months ago

Having the same issue with the same setup.

mikeshi80 commented 2 months ago

I wrote the approach here install vllm on jetson, and there is a chapter about how to install vllm-flash-attn on Jetson. Although the name is different, but vllm-flash-attn is just the fork of flash-attn, they share almost all source code.

mikeshi80 commented 2 months ago

The blog article is written in Chinese, translation is welcomed.

linhuaiyi commented 1 month ago

The blog article is written in Chinese, translation is welcomed.

I note that there is no info about jetpack,cuda,python version. Would you mind supply these info. Cuz I found a lot of problems especailly when compile things about cuda according to your blog.

mikeshi80 commented 1 month ago

JetPack 6, with the built-in CUDA 12.2. And the torch is 2.3.0, you need build it yourself.