Open josevalim opened 5 months ago
Probably not much (I still need to see), though I am worried about memory cost of current implementations. LargeVis
dimensionality reduction uses large num_neighbors
(150 in the paper) and RandomProjectionForest.predict/2
takes $O(QTK)$ memory for number of queries $Q$, number of trees $T$, and number of neighbors $K$. In any case, leave it to me.
By the way, thanks for the shout-out on the blog. :)
Apparently, this is much harder than I thought. One of the issues is sampling the negative edges of the k-NN graph, i.e. edges that are not in the graph. To do this, the authors allocate and iterate over a rather large table. I am not sure about this approach and I don't think I can come up with something efficiently implementable using Nx
.
Why do you think it would be hard to implement it? The parr that seems complex is the head traversal, but if we can assume a max depth (not sure if that’s the case), it should be parallelizable.
It's not really about head traversal, more about allocating a table of size 1e8
. I'm still checking if this can be avoided.
The problem above might be avoidable, but there is another one that might not be. Namely, the method involves symmetrising the k-NN graph and this requires the computation of reverse k-NN graph which I don't think I can implement efficiently within Nx
.
The symmetrized seems to be doable with a while loop over i,j?
Even if it is possible, it would be expensive as we would need to loop over $n \times k$ edges. A bigger issue is how to store the weights. $p{j | i}$ is $n \times k$ matrix (or sparse $n \times n$ matrix) since we take $k$-nearest neighbors for each of $n$ points; each of $n$ nodes in $k$-NN graph has degree of $k$. $p{i | j}$ requires reversing the $k$-NN graph and the degree of every node can vary from $0$ (the point is not among $k$-nearest neighbors of any other point) to $n - 1$ (the point is contained among $k$-nearest neighbors of every other point). I cannot think of a way to represent the weights with a tensor.
We already have the building blocks for ANN. @krstopro, how much work do you believe it is to also have it for manifold/dimensionality reduction?