Open JonasDeSchouwer opened 4 months ago
I have the same problem. I added the normal distribution generator c_random to keopscore/utils/code_gen_utils.py, but I am not good enough for developing C++ environment. I have no idea how to solve this problem.
from ctypes import CDLL
import math
libc = CDLL("libc.so.6")
class c_random(c_variable):
# class to represent a C++ variable, storing its c++ name and its C++ type.
def __new__(self, dtype, list_string_id=None):
if isinstance(list_string_id, list):
return list(c_random(dtype) for _ in list_string_id)
else:
return super(c_variable, self).__new__(self)
def __init__(self, dtype, string_id=None):
mu, std = (libc.rand()%100)/100, (libc.rand()%100)/100
self.dtype = dtype if dtype != "int" else "float"
self.id = str(math.sqrt(-2 * math.log(std)) * math.cos(2 * math.pi * mu))
Hi, it would be very useful to have large random matrices that can be stored symbolically, e.g. with a LazyTensor equivalent of
torch.rand()
.My use case: I am working with a variant of the K-nearest neighbours algorithm. I have $N$ nodes, for each of which I want to sample $K$ neighbours (without replacement). For node $i$, the probabilities of selecting each neighbour $j$ are given by an unnormalized probability vector $\vec{p}^T$. These vectors are saved in a symbolic matrix $P = \left[ \vec{p_1} \ \cdots \ \vec{p_N} \right] ^T$.
To sample without replacement, I want to use the Gumbel top-k trick, as in this paper. In short, how this works is: for each $i$, sample a vector $\vec{q}\in[0,1]^N$ uniformly. Then take the indices of the $K$ smallest elements of $\log(\vec{p_i}) -\log(-\log(\vec{q}))$.
How I would do this in the dense case:
Note, however, that $N$ is potentially huge (~10K-10M), so naturally I want to use symbolic matrices for both $P$ and $Q$.
How I would like to do this with symbolic matrices:
I am quite new to this repository, so it is possible that I am overlooking a feature / workaround. If that is the case, some pointers would be appreciated!