getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.04k stars 65 forks source link

Feature request: symbolic random matrices #372

Open JonasDeSchouwer opened 4 months ago

JonasDeSchouwer commented 4 months ago

Hi, it would be very useful to have large random matrices that can be stored symbolically, e.g. with a LazyTensor equivalent of torch.rand().

My use case: I am working with a variant of the K-nearest neighbours algorithm. I have $N$ nodes, for each of which I want to sample $K$ neighbours (without replacement). For node $i$, the probabilities of selecting each neighbour $j$ are given by an unnormalized probability vector $\vec{p}^T$. These vectors are saved in a symbolic matrix $P = \left[ \vec{p_1} \ \cdots \ \vec{p_N} \right] ^T$.

To sample without replacement, I want to use the Gumbel top-k trick, as in this paper. In short, how this works is: for each $i$, sample a vector $\vec{q}\in[0,1]^N$ uniformly. Then take the indices of the $K$ smallest elements of $\log(\vec{p_i}) -\log(-\log(\vec{q}))$.

How I would do this in the dense case:

# P: BxNxN probability matrix
Q = torch.rand_like(P)
temp = torch.log(P) - torch.log(-torch.log(Q))
result = torch.topk(temp, K)    # note that torch.topk() works rowwise

Note, however, that $N$ is potentially huge (~10K-10M), so naturally I want to use symbolic matrices for both $P$ and $Q$.

How I would like to do this with symbolic matrices:

# P: BxNxN symbolic probability matrix
Q = pykeops.random.uniform(P.size())    # symbolic matrix
temp = torch.log(P) - torch.log(-torch.log(Q))    # still symbolic
result = temp.argKmin(K, dim=1)    # reduction returns a normal tensor containing NxK indices

I am quite new to this repository, so it is possible that I am overlooking a feature / workaround. If that is the case, some pointers would be appreciated!

TakeYourLife3000 commented 2 months ago

I have the same problem. I added the normal distribution generator c_random to keopscore/utils/code_gen_utils.py, but I am not good enough for developing C++ environment. I have no idea how to solve this problem.

from ctypes import CDLL
import math
libc = CDLL("libc.so.6")

class c_random(c_variable):
    # class to represent a C++ variable, storing its c++ name and its C++ type.
    def __new__(self, dtype, list_string_id=None):
        if isinstance(list_string_id, list):
            return list(c_random(dtype) for _ in list_string_id)
        else:
            return super(c_variable, self).__new__(self)

    def __init__(self, dtype, string_id=None):
        mu, std = (libc.rand()%100)/100, (libc.rand()%100)/100

        self.dtype = dtype if dtype != "int" else "float"
        self.id = str(math.sqrt(-2 * math.log(std)) * math.cos(2 * math.pi * mu))