getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

Batch matrix multiply with reduction #243

Open ghost opened 2 years ago

ghost commented 2 years ago

For A with shape=(100000, 500, 10) and B with shape=(10000, 500) how do we obtain A @ B = C where C has shape=(100000, 10000, 10) ?

I don't intend to realize C. I'd like reduce via argmax.

I wasn't clear this would be supported due to this comment: https://github.com/getkeops/keops/issues/35#issuecomment-555552164

Thanks for any pointers,

jeanfeydy commented 2 years ago

Hi @mach881040,

The trick is to put the “batch” dimension at the first place - for reasons discussed in this tutorial. You should cast:

Then, you can create the matrix of similarities with:

C_ij = (A_i | B_j)  # (10, 10000, 10000) LazyTensor

An argmax reduction will then allow you to perform batched similarity search and output a (10, 10000, 1) tensor.

As a side note: I believe that the first version of your issue was making reference to collections of 60k and 10k vector of dimension 784, i.e. to the dimensions of the MNIST dataset. You may thus be interested by this tutorial.

Does this answers your question?

Best regards, Jean

ghost commented 2 years ago

Thanks for the pointer to the tutorial.

But does KeOps support matrix multiplication and batch matrix multiplication?

Suppose you have a tensor A and a tensor B, and you compute A @ B in PyTorch.

Now you modify B such that its rows are new matrices of size B, [B0,B1,B2,B3,....] and you wish to obtain [A@B0, A@B1, etc...]

Designing a domain specific language for this operation, we might write A @@ B.

In torch I was previously able to implement this using einsum. Is that similar to Genred?

Thanks,

jeanfeydy commented 2 years ago

Hi @mach881040,

If I understand your question correctly, this is what my previous answer is about. Assuming that A is (N,D) and B is (D,M) so that A@B is (N,M), you should:

  1. Encode your batch of K matrices B as a (K,1,M,D) LazyTensor B_j.
  2. Encode your matrix A as a (1,N,1,D) LazyTensor A_i.
  3. Use the “dot product” operator (| in KeOps) to express the fact that you want to compute a collection of matrix multiplication coefficients (= sums over the dimension of length D) with C_ij = (A_i | B_j).

C_ij will then be a (K,N,M,1) LazyTensor, that contains exactly what you denote as [A@B0, A@B1, …]. Feel free to use it in your (symbolic) computations as needed :-)

Best regards, Jean

ghost commented 2 years ago

Thanks! I just read your whitepaper, nice work.

How do you pronounce KeOps?

jeanfeydy commented 2 years ago

You’re very welcome! The core devs are French so we just say “Kay-Ops” (exactly as we would for the Keops pyramid in Cairo), but anything is fine really ;-)

I’m closing the issue then - please feel free to open a new one if you have another question. Best regards, Jean

ghost commented 2 years ago

I was not able to get the reduction working. Here is a reduced example:

(LazyTensor(rand(size=(1,2,1,99))) | LazyTensor(rand(size=(1,1,2,99)))).argmax(1)

Error:

[KeOps] Generating code for formula ArgMax_Reduction(Var(0,99,0)|Var(1,99,1),1) ... 
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[<ipython-input-124-92fa8d27405a>](https://localhost:8080/#) in <module>()
----> 1 (LazyTensor(rand(size=(1,2,1,99))) | LazyTensor(rand(size=(1,1,2,99)))).argmax(1)

21 frames
[/usr/local/lib/python3.7/dist-packages/keopscore/utils/misc_utils.py](https://localhost:8080/#) in KeOps_Error(message, show_line_number)
     26         frameinfo = getframeinfo(currentframe().f_back)
     27         message += f" (error at line {frameinfo.lineno} in file {frameinfo.filename})"
---> 28     raise ValueError(message)
     29 
     30 

ValueError: [KeOps] Error : not implemented: casting from float* to float (error at line 265 in file /usr/local/lib/python3.7/dist-packages/keopscore/utils/code_gen_utils.py)

If you lower 99 to ie 98, you no longer get an error.

jeanfeydy commented 2 years ago

Hi @mach881040,

I see, this is certainly due to an optimization (the “chunks”, implemented by @joanglaunes) that we did not test properly. @joanglaunes, do you remember how to disable it for a quick fix? And do you think that the bug would be easy to solve?

Best regards, Jean

ghost commented 2 years ago

Is there a version of KeOps before this optimization that would be useable?

joanglaunes commented 2 years ago

Hello @mach881040, The option to disable the special computation mode is enable_chunks=False. So in your case this works : (LazyTensor(rand(size=(1,2,1,99))) | LazyTensor(rand(size=(1,1,2,99)))).argmax(1, enable_chunks=False)̀ I will investigate to see why it fails with the optimization enabled.

Turakar commented 11 months ago

Seems like this issue should be fixed but was not closed automatically?