I've switched the ANN search to the HNSW library which is faster.
I've also fixed a rare bug where a data point could disappear from the output (if all of its nearest neighbor had no shared nearest neighbor with it and if no point from which it is a nearest neighbor had a common nearest neighbor with it. Happened once in a 3,000,000+ dataset but I remember having encountered that bug before).
Would be good to a least merge the bug fix! It corresponds to the following code snippet from the phenograph.R file
links <- links[links[,1]>0, ]
## Fix if data point goes missing (due to all of its associated jaccard coefficients being 0 and if it ever appears as another points' nearest neighbor, the corresponding jaccard coeficient also being 0.
u = unique(c(links[,1],links[,2]))
u = setdiff(1:nrow(data),u) ## Check if data point has no link
links=rbind(links,matrix(ncol=3,byrow=FALSE,data=c(u,u,rep(1,length(u)))))
Hi
I've switched the ANN search to the HNSW library which is faster.
I've also fixed a rare bug where a data point could disappear from the output (if all of its nearest neighbor had no shared nearest neighbor with it and if no point from which it is a nearest neighbor had a common nearest neighbor with it. Happened once in a 3,000,000+ dataset but I remember having encountered that bug before).
Would be good to a least merge the bug fix! It corresponds to the following code snippet from the phenograph.R file
Thanks!