AlineTalhouk / diceR

Diverse Cluster Ensemble in R
https://alinetalhouk.github.io/diceR/
Other
34 stars 10 forks source link

Using diceR for consensus when clustering methods have already been performed. #126

Closed bw4sz closed 6 years ago

bw4sz commented 6 years ago

I was recommended to check out this package after posting on SO. I'm wondering if you believe this is the right tool for my problem. It might make a useful case-study, i'm happy to provide a .Rmd if needed.

Short Explanation with toy reproducible code

Consider a set of points with indices 1 to 10. We are interested in assigning each value to a mutually exclusive cluster. Multiple clustering methods have already been applied to the data. The results are below.

algorithm1<-list(c(1,2,3),c(4,5,6),c(7,8,9,10))
algorithm2<-list(c(1,2,3),c(4,6),c(5,7,8,9,10))
algorithm3<-list(c(1,2,3),c(4,6),c(5,7,8),c(9,10))

rplot

How can I use your consensus cluster methods to perform majority rule (or any of the other nicely implemented) voting schemes in this package? It seems that the, understandable, goal of the package is to take raw data, perform distance metrics on them, and clustering tools, and the ensembling. Can a user just perform the last steps? My clustering methods are not the kind which can be implemented here (I can explain in more detail if needed). The actual use case is in 3d lidar point cloud clustering for tree segmentation. The point cloud contains millions of points.

dchiu911 commented 6 years ago

You can directly use the consensus functions (e.g. majority_voting, k_modes). Based on the figure, you can arrange the cluster assignments into the following structure and then apply the ensembles:

library(diceR)
library(tidyverse)

algorithm1 <- list(c(1, 2, 3), c(4, 5, 6), c(7, 8, 9, 10))
algorithm2 <- list(c(1, 2, 3), c(4, 6), c(5, 7, 8, 9, 10))
algorithm3 <- list(c(1, 2, 3), c(4, 6), c(5, 7, 8), c(9, 10))

pts <- seq_len(10)
res <- list(algorithm1, algorithm2, algorithm3) %>%
  set_names(paste0("alg", seq_along(.))) %>%
  map_df(function(a) {
    map_int(pts, function(i)
      which(map_lgl(a, ~ i %in% .)))
  }) %>%
  as.matrix()

majority_voting(res, is.relabelled = FALSE)
k_modes(res, is.relabelled = FALSE)
LCE(as.matrix(res), k = 4)
bw4sz commented 6 years ago

Thank you! I think this will be a helpful example for others.