Open JonasDeSchouwer opened 6 months ago
I have the same problem. I added the normal distribution generator c_random to keopscore/utils/code_gen_utils.py, but I am not good enough for developing C++ environment. I have no idea how to solve this problem.
from ctypes import CDLL
import math
libc = CDLL("libc.so.6")
class c_random(c_variable):
# class to represent a C++ variable, storing its c++ name and its C++ type.
def __new__(self, dtype, list_string_id=None):
if isinstance(list_string_id, list):
return list(c_random(dtype) for _ in list_string_id)
else:
return super(c_variable, self).__new__(self)
def __init__(self, dtype, string_id=None):
mu, std = (libc.rand()%100)/100, (libc.rand()%100)/100
self.dtype = dtype if dtype != "int" else "float"
self.id = str(math.sqrt(-2 * math.log(std)) * math.cos(2 * math.pi * mu))
I met the same problem when I wanted to sample a 2D Gumbel noise with large dimensions (N, M) where N,M=65536. I have found a simple workaround for those who want to generate a symbolic random uniform array:
def uniform_lazy_tensor_2d(dim0, dim1, device) -> LazyTensor:
"""Initialize a 2D lazy tensor with uniform random values."""
rand_x = LazyTensor(torch.rand((dim0, 1, 1), device=device))
rand_y = LazyTensor(torch.rand((1, dim1, 1), device=device))
rand_xy = (rand_x * 12.9898 + rand_y * 78.233).sin() * 43758.5453123
rand_xy_floor = (rand_xy - 0.5).round()
rand_xy_fract = rand_xy - rand_xy_floor
rand_xy_clamp = rand_xy_fract.clamp(0, 1)
return rand_xy_clamp
It composes two random 1D dense arrays using hash-like computation commonly used in GPU shaders. I mimic the behavior of fract()
using round()
here, which may lose some precision but I guess it should work well in most cases.
Test code:
U = uniform_lazy_tensor_2d(65536, 65536, 'cuda')
mean = U.sum(0).sum() / 65536**2
variance = ((U - mean)**2).sum(0).sum() / 65536**2
print(f"mean: {mean}, var: {variance}") # mean should equal 1/2 and var should equal 1/12
Output:
mean: 0.49997401237487793, var: 0.08322032541036606
Hi, it would be very useful to have large random matrices that can be stored symbolically, e.g. with a LazyTensor equivalent of
torch.rand()
.My use case: I am working with a variant of the K-nearest neighbours algorithm. I have $N$ nodes, for each of which I want to sample $K$ neighbours (without replacement). For node $i$, the probabilities of selecting each neighbour $j$ are given by an unnormalized probability vector $\vec{p}^T$. These vectors are saved in a symbolic matrix $P = \left[ \vec{p_1} \ \cdots \ \vec{p_N} \right] ^T$.
To sample without replacement, I want to use the Gumbel top-k trick, as in this paper. In short, how this works is: for each $i$, sample a vector $\vec{q}\in[0,1]^N$ uniformly. Then take the indices of the $K$ smallest elements of $\log(\vec{p_i}) -\log(-\log(\vec{q}))$.
How I would do this in the dense case:
Note, however, that $N$ is potentially huge (~10K-10M), so naturally I want to use symbolic matrices for both $P$ and $Q$.
How I would like to do this with symbolic matrices:
I am quite new to this repository, so it is possible that I am overlooking a feature / workaround. If that is the case, some pointers would be appreciated!