Open ms1design opened 6 months ago
@ms1design Thanks a lot for sharing this -- I was also trying to build the FlashAttention on the Jetson AGX Orin 64 GB. So I followed your steps and applied the diff that you specified. I am cloning it based off the current commit 74b0761ff7efc7b90d4e5aeb529c1b2a09a7458c but I am not able to built it. Can you share with which commit you were able to built it so that I can also git checkout
that specific commit -- or even better if you could share the .whl
file
git clone --depth=1 https://github.com/Dao-AILab/flash-attention
cd flash-attention
git apply flash-attn.diff
python3 setup.py install
By the way, how much time did you take you to build it? For me the built process seems to take forever.
@ms1design How much time does it take for your build to finish? For me it never seems to finish and gets stuck at this point
And which nvcc
and CUDA version
are you using?
Hi @iamsiddhantsahu,
I'm a contributor of jetson-containers
repository where you can find flash-attention
docker container for Jetson devices. And yes it's available as well as pre-builded wheels if you switch to the jetson-containers
repo.
Give it a try, you can also add your own containers and mix all of available (and in most cases pre-builded wheels or tarballs) AI/ML libraries for jetson devices.
As flash-attention
is usually just a part of my builds there, but the latest successful build was using:
Lib | Version |
---|---|
cuda |
12.2 |
cudnn |
8.9 |
tensorrt |
8.6 |
python |
3.10 |
pytorch |
2.2 |
All of those above you can set before building your docker container in jetson-containers
repo.
Many thanks @ms1design for letting me know this -- yes, I am giving it a try right now.
I wanted to actually compare:
Which jetson-container
would you recommend in order to install QServe
?
@iamsiddhantsahu just try to build the QServe from source on jetson from a new container. You can follow docs: https://github.com/dusty-nv/jetson-containers/blob/master/docs/packages.md
@ms1design thanks for the suggestion -- yes will try that
I think QServe
depends on the FlashAttention
and xFormers
package that I was interested in flash-attention
container https://github.com/dusty-nv/jetson-containers/tree/master/packages/llm/flash-attention
Hi,
I noticed that the new version 2.5.5 breaks build from source on Jetson devices.
Previously working build script
flash-attn.diff
- Git Patch used to enable the build from the sourceError when building v2.5.5 on Jetson