Closed dchiu911 closed 7 years ago
That's a great idea. I completely forgot about HDBSCAN. It also provides a soft clustering correct? Also, using it for outliership is exactly what I had in mind. What is your reason behind the default of 5?
@dchiu911 I think there may be a way to choose k
with HDBSCAN during the hierarchical step. I will look into this.
I changed from 5 to 2 because it resulted in fewer noise points.
@Dustin21 so I made an initial implementation for HDBSCAN, code is here
Whenever it is called, the clustering is separated from the rest of the consensus. Instead, we report, for each replication, the number of clusters and the proportion of outliers.
It looks great! I'll give it a test run on Genesis next week with the parameter defaults.
Barring major errors, going to close. Reminder: dbscan
package has
SystemRequirements: C++11
hence keeping it in dev version of diceR
Here are my thoughts on incorporation @Dustin21:
I've already made a few changes:
eps
values, so we only need to specify single paramterminPts
minPts
default: 5 --> 2What do you think?