Open cthulhu-tww opened 4 months ago
Yes, I did. If you still want to know solution, you can leave a message. Maybe I will write a blog sometime later.
Yes, I did. If you still want to know solution, you can leave a message. Maybe I will write a blog sometime later.
Yes, please, I ran into the same issue
Yes, I did. If you still want to know solution, you can leave a message. Maybe I will write a blog sometime later.
Yes, please, I ran into the same issue
OK, I will do it ASAP, however, since it is written in Chinese, maybe you need use translator to read it.
PS: OK, I read your bio. Obviously, you can read Chinese
Yes, I did. If you still want to know solution, you can leave a message. Maybe I will write a blog sometime later.
Yes, please, I ran into the same issue
OK, I will do it ASAP, however, since it is written in Chinese, maybe you need use translator to read it.
PS: OK, I read your bio. Obviously, you can read Chinese
Thank you, I am Chinese myself haha, I'll try to translate your blog into English for others to reference
Having the same issue with the same setup.
I wrote the approach here install vllm on jetson, and there is a chapter about how to install vllm-flash-attn on Jetson. Although the name is different, but vllm-flash-attn is just the fork of flash-attn, they share almost all source code.
The blog article is written in Chinese, translation is welcomed.
The blog article is written in Chinese, translation is welcomed.
I note that there is no info about jetpack,cuda,python version. Would you mind supply these info. Cuz I found a lot of problems especailly when compile things about cuda according to your blog.
JetPack 6, with the built-in CUDA 12.2. And the torch is 2.3.0, you need build it yourself.
我刚才在我jetson上成功编译了这个库,但是运行llm推理的时候,报错 RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.jetpack:6 cuda:12.2 python:3.10 torch:用的英伟达的轮子torch-2.3.0a0+6ddf5cf85e.nv24.04.14026654-cp310-cp310-linux_aarch64.whl