Best practices for implementing vector attention?

Hello, First, thanks for the great work of keops. In my work, I want to implement a vector attention like what Point Transformer did. But current keops reduction softmax operator only support scalar softmax. Are there any best practices for implementing vector softmax? I tried to implement this follow the math formula of softmax, but I need to make reduction for many times, first for the sum of exp, second for the overall sum reduction, if I want to improved numerical precision maybe I also need a max reduction. But there are a lot of calculations for the matrix before reducing, (like MLP implement by keops), if I follow my above implementation method, there will be a lot of double calculations. Is there any solution?

Any help is appreciated, Best,

Ruotian

getkeops / keops

Best practices for implementing vector attention? #193