cpanse / uvpd

Ultra HRMS in combination with UVPD fragmentation for enhanced structural identification of organic micropollutants
https://doi.org/10.3390/molecules25184189
0 stars 0 forks source link

determine number of clusters #5

Open cpanse opened 6 years ago

cpanse commented 6 years ago

load("~/__checkouts/R/UVPD/data/susdat.RData")

idx.corrupt.pubchem <- which(sapply(fps.pubchem, class) != "fingerprint")
idx.correct.pubchem <- which(sapply(fps.pubchem, class) == "fingerprint")
fps.pubchem <- fps.pubchem[idx.correct.pubchem]

fp.pubchem.tanimoto <- fingerprint::fp.sim.matrix(fps.pubchem, method='tanimoto')
fp.dist.pubchem <- as.dist(1 - (fp.pubchem.tanimoto))

op <- par(mfrow = c(1, 1))

hist(fp.dist.pubchem)
library(parallel)
fp.dist.pubchem[is.na(fp.dist.pubchem)] <- 2
km.pubchem.withinss <- mclapply(5:25, function(i){kmeans(fp.dist.pubchem, i)$withinss})

save(km.pubchem.withinss, file = "km.pubchem.withinss.RData")
cpanse commented 6 years ago

screen shot 2018-04-18 at 11 39 45