Flash attention on Nvidia Jetson Orin AGX and Xavier NX

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

BSD 3-Clause "New" or "Revised" License

13.28k stars 1.2k forks source link

Flash attention on Nvidia Jetson Orin AGX and Xavier NX #844

Open WeizeY opened 6 months ago

WeizeY commented 6 months ago

I was wonder if anyone has been able to use flash attention on nvidia jetson orin agx or nvidia jetson xavier nx, and if so, can you explain which version and how you manged to make it work? it seems like it only supports major compute versions (7.5, 8.0, 8.6, 8.9, 9.0) while xavier is 7.2 and orin is 8.7

ms1design commented 6 months ago

@WeizeY Please follow https://github.com/Dao-AILab/flash-attention/issues/860 but use the last known working branch: https://github.com/Dao-AILab/flash-attention/tree/v2.5.4