getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.03k stars 65 forks source link

Question: Extending ranges to 3D+ tensors #120

Open tkon3 opened 3 years ago

tkon3 commented 3 years ago

Working with block-sparse reduction, my code runs properly with 2D tensors. However it fails to compile if I add an additional dimension (batch), leading to this error:

[KeOps] The 'ranges' argument (block-sparse mode) is not supported with batch processing, but we detected 1 > 0 batch dimensions.

What is the best way to handle block sparse reduction with a higher dimensional tensor ? Note that the batch size can actually be different in certain conditions.

Thank you.

jeanfeydy commented 3 years ago

Hi @tkon3 ,

Thanks for your feedback and interest in the library! I fully agree with you that the block-sparse API is not well documented yet: I will try to fix it quickly, with new helpers for common "sparsity masks" such as band-diagonal matrices, triangle matrices, image pixels/voxels with a maximum interaction radius, etc.

Until then, we will probably need to write "ranges" array by hand to solve your problem. The main thing to understand here is that deep down, the KeOps C++ engine handles all operations as reductions of a 2D matrix: batch processing is encoded using a block-diagonal sparsity mask (+ some pointer arithmetic to handle broadcasting over batch dimensions), in a similar fashion to PyTorch geometric.

If you intend to perform batch processing with block-sparsity masks, the most efficient thing to do is to "fuse" a collection of B masks of size (N,M) into a single "diagonal-ish" mask of shape (B*N,B*M) - something that I will automate in the near future but haven't done yet, hence the error message. I have detailed most of the relevant maths in #80: if you need some more help, feel free to ask :-)

Best regards, Jean

tkon3 commented 3 years ago

Thank you for your answer. I think this library may be a game changer for some applications.

My first idea was indeed to stack everything into a 2D tensor and compute all ranges on it.

However:

To make it compatible with the rest of the formula, I have to add more sparse computation and I have to compute all the ranges on the fly as the batch size isn't fixed. It is far from trivial.

So I stick with a simpler approach Here the simplest case:

# t and d are constants
t, d = 7, 3

x, y = torch.randn(t, d), torch.randn(t, d)

ranges = get_ranges(...) # some function returning 6-uple ranges

xi = LazyTensor( x.unsqueeze(-2) )
yj = LazyTensor( y.unsqueeze(-3) )

output = (xi | yj) #_shape == (t, t, 1)
output.ranges = ranges

# Additional computation before reduction
output = f(output, ...)

If I add a batch:

# t and d are constants
# n isn't a constant
n, t, d = 10, 7, 3

x, y = torch.randn(n, t, d), torch.randn(n, t, d)

ranges = get_ranges(...) # some function returning 6-uple ranges

xi = LazyTensor( x.unsqueeze(-2) )
yj = LazyTensor( y.unsqueeze(-3) )

output = (xi | yj) #_shape == (n, t, t, 1)
output.ranges = ranges # fail here

# Additional computation before reduction
output = f(output, ...)

Note that ranges are the same for all as it is a "geometric" pattern. The easiest way to partially solve the problem is to compute ranges once and loop over the first dimension (for each 2D tensor in the 3D tensor). It is quite fast compared to everything else but there is obviously some rooms for improvement as n can take a large value, typically n > 10000 .

Some kind of overloading may be extremely usefull in this type of situation.

Best regards.