getkeops / keops

KErnel OPerationS, on CPUs and GPUs, with autodiff and without memory overflows
https://www.kernel-operations.io
MIT License
1.06k stars 64 forks source link

Feature request: symbolic random matrices #372

Open JonasDeSchouwer opened 6 months ago

JonasDeSchouwer commented 6 months ago

Hi, it would be very useful to have large random matrices that can be stored symbolically, e.g. with a LazyTensor equivalent of torch.rand().

My use case: I am working with a variant of the K-nearest neighbours algorithm. I have $N$ nodes, for each of which I want to sample $K$ neighbours (without replacement). For node $i$, the probabilities of selecting each neighbour $j$ are given by an unnormalized probability vector $\vec{p}^T$. These vectors are saved in a symbolic matrix $P = \left[ \vec{p_1} \ \cdots \ \vec{p_N} \right] ^T$.

To sample without replacement, I want to use the Gumbel top-k trick, as in this paper. In short, how this works is: for each $i$, sample a vector $\vec{q}\in[0,1]^N$ uniformly. Then take the indices of the $K$ smallest elements of $\log(\vec{p_i}) -\log(-\log(\vec{q}))$.

How I would do this in the dense case:

# P: BxNxN probability matrix
Q = torch.rand_like(P)
temp = torch.log(P) - torch.log(-torch.log(Q))
result = torch.topk(temp, K)    # note that torch.topk() works rowwise

Note, however, that $N$ is potentially huge (~10K-10M), so naturally I want to use symbolic matrices for both $P$ and $Q$.

How I would like to do this with symbolic matrices:

# P: BxNxN symbolic probability matrix
Q = pykeops.random.uniform(P.size())    # symbolic matrix
temp = torch.log(P) - torch.log(-torch.log(Q))    # still symbolic
result = temp.argKmin(K, dim=1)    # reduction returns a normal tensor containing NxK indices

I am quite new to this repository, so it is possible that I am overlooking a feature / workaround. If that is the case, some pointers would be appreciated!

MH-limarco commented 4 months ago

I have the same problem. I added the normal distribution generator c_random to keopscore/utils/code_gen_utils.py, but I am not good enough for developing C++ environment. I have no idea how to solve this problem.

from ctypes import CDLL
import math
libc = CDLL("libc.so.6")

class c_random(c_variable):
    # class to represent a C++ variable, storing its c++ name and its C++ type.
    def __new__(self, dtype, list_string_id=None):
        if isinstance(list_string_id, list):
            return list(c_random(dtype) for _ in list_string_id)
        else:
            return super(c_variable, self).__new__(self)

    def __init__(self, dtype, string_id=None):
        mu, std = (libc.rand()%100)/100, (libc.rand()%100)/100

        self.dtype = dtype if dtype != "int" else "float"
        self.id = str(math.sqrt(-2 * math.log(std)) * math.cos(2 * math.pi * mu))
dhbloo commented 1 week ago

I met the same problem when I wanted to sample a 2D Gumbel noise with large dimensions (N, M) where N,M=65536. I have found a simple workaround for those who want to generate a symbolic random uniform array:

def uniform_lazy_tensor_2d(dim0, dim1, device) -> LazyTensor:
    """Initialize a 2D lazy tensor with uniform random values."""
    rand_x = LazyTensor(torch.rand((dim0, 1, 1), device=device))
    rand_y = LazyTensor(torch.rand((1, dim1, 1), device=device))

    rand_xy = (rand_x * 12.9898 + rand_y * 78.233).sin() * 43758.5453123
    rand_xy_floor = (rand_xy - 0.5).round()
    rand_xy_fract = rand_xy - rand_xy_floor
    rand_xy_clamp = rand_xy_fract.clamp(0, 1)

    return rand_xy_clamp

It composes two random 1D dense arrays using hash-like computation commonly used in GPU shaders. I mimic the behavior of fract() using round() here, which may lose some precision but I guess it should work well in most cases.

Test code:

U = uniform_lazy_tensor_2d(65536, 65536, 'cuda')
mean = U.sum(0).sum() / 65536**2
variance = ((U - mean)**2).sum(0).sum() / 65536**2
print(f"mean: {mean}, var: {variance}")  # mean should equal 1/2 and var should equal 1/12

Output:

mean: 0.49997401237487793, var: 0.08322032541036606