JinmiaoChenLab / Rphenograph

Rphenograph: R implementation of the PhenoGraph algorithm
46 stars 26 forks source link

Faster ANN and bug fix #8

Open ebecht opened 5 years ago

ebecht commented 5 years ago

Hi

I've switched the ANN search to the HNSW library which is faster.

I've also fixed a rare bug where a data point could disappear from the output (if all of its nearest neighbor had no shared nearest neighbor with it and if no point from which it is a nearest neighbor had a common nearest neighbor with it. Happened once in a 3,000,000+ dataset but I remember having encountered that bug before).

Would be good to a least merge the bug fix! It corresponds to the following code snippet from the phenograph.R file

links <- links[links[,1]>0, ]

## Fix if data point goes missing (due to all of its associated jaccard coefficients being 0 and if it ever appears as another points' nearest neighbor, the corresponding jaccard coeficient also being 0.
    u = unique(c(links[,1],links[,2]))
    u = setdiff(1:nrow(data),u) ## Check if data point has no link
    links=rbind(links,matrix(ncol=3,byrow=FALSE,data=c(u,u,rep(1,length(u)))))

Thanks!