I was wonder if anyone has been able to use flash attention on nvidia jetson orin agx or nvidia jetson xavier nx, and if so, can you explain which version and how you manged to make it work? it seems like it only supports major compute versions (7.5, 8.0, 8.6, 8.9, 9.0) while xavier is 7.2 and orin is 8.7
I was wonder if anyone has been able to use flash attention on nvidia jetson orin agx or nvidia jetson xavier nx, and if so, can you explain which version and how you manged to make it work? it seems like it only supports major compute versions (7.5, 8.0, 8.6, 8.9, 9.0) while xavier is 7.2 and orin is 8.7