getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 64 forks source link

Best practices for implementing vector attention? #193

Open zrt opened 3 years ago

zrt commented 3 years ago

Hello, First, thanks for the great work of keops. In my work, I want to implement a vector attention like what Point Transformer did. But current keops reduction softmax operator only support scalar softmax. Are there any best practices for implementing vector softmax? I tried to implement this follow the math formula of softmax, but I need to make reduction for many times, first for the sum of exp, second for the overall sum reduction, if I want to improved numerical precision maybe I also need a max reduction. But there are a lot of calculations for the matrix before reducing, (like MLP implement by keops), if I follow my above implementation method, there will be a lot of double calculations. Is there any solution?

Any help is appreciated, Best,

Ruotian