AlineTalhouk / diceR

Diverse Cluster Ensemble in R
https://alinetalhouk.github.io/diceR/
Other
34 stars 10 forks source link

Options for HDBSCAN #120

Closed dchiu911 closed 7 years ago

dchiu911 commented 7 years ago

Here are my thoughts on incorporation @Dustin21:

  1. Do not include with the rest of the algorithms for consensus consideration.
  2. Use it as a metric for reporting "outliership" and number of clusters for each run of the algorithm.

I've already made a few changes:

What do you think?

Dustin21 commented 7 years ago

That's a great idea. I completely forgot about HDBSCAN. It also provides a soft clustering correct? Also, using it for outliership is exactly what I had in mind. What is your reason behind the default of 5?

Dustin21 commented 7 years ago

@dchiu911 I think there may be a way to choose k with HDBSCAN during the hierarchical step. I will look into this.

dchiu911 commented 7 years ago

I changed from 5 to 2 because it resulted in fewer noise points.

dchiu911 commented 7 years ago

@Dustin21 so I made an initial implementation for HDBSCAN, code is here

Whenever it is called, the clustering is separated from the rest of the consensus. Instead, we report, for each replication, the number of clusters and the proportion of outliers.

Dustin21 commented 7 years ago

It looks great! I'll give it a test run on Genesis next week with the parameter defaults.

dchiu911 commented 7 years ago

Barring major errors, going to close. Reminder: dbscan package has

SystemRequirements: C++11

hence keeping it in dev version of diceR