JuliaStats / Clustering.jl

A Julia package for data clustering
Other
355 stars 118 forks source link

add HDBSCAN? #139

Open currymj opened 5 years ago

currymj commented 5 years ago

DBSCAN is already included. There is a successor, hdbscan which has a famously good Python package, and is fairly popular.

DBSCAN is already here, and there are hierarchical clustering algorithms as well, so it's possible some code could be reused. There's a good explanation here of all the pieces of the algorithm.

I wish I were submitting a PR instead of just a feature request issue, but I still think a pure Julia implementation would be good to have.

Also, if anybody Googling for a Julia HDBSCAN implementation stumbles on this issue, you can just use PyCall.jl to call the hdbscan Python package. It works fine, just remember to transpose your data matrix because the Python convention is the opposite of Julia.

baggepinnen commented 5 years ago

I created a minimum-effort wrapper here https://github.com/baggepinnen/HDBSCAN.jl

babaq commented 4 years ago

a Julia version is always the best, but thanks for the wrapper @baggepinnen .

chelate commented 7 months ago

@MommaWatasu has coded this in pure julia here: https://github.com/MommaWatasu/HorseML.jl/blob/master/src/Clustering/HDBSCAN.jl

Data points are rows instead of columns.

It looks simple and clean, cant believe how much has been written for HorseML, dont know how it fits with rest of Clustering.jl api but I will try it out on my dataset now. Wondering how he would appreciate code reuse in Clustering.jl .

MommaWatasu commented 7 months ago

I wrote the code only for learning and I haven't maintained it for long (since no one uses it). I created PR which contains my code from HorseML.jl. I hope my code is useful.