christopherjenness / DBCV

Python implementation of Density-Based Clustering Validation
MIT License
154 stars 41 forks source link

Question about the Core Distance of an Object formula #10

Closed avouros closed 5 years ago

avouros commented 5 years ago

Thank you very much for providing the code for the DBCV index.

I noticed in the _core_dist function that you have defined the number of neighbours (n_neighbors) to equal the dimensionality of the dataset np.shape(neighbors)[1] (Line 57 of the DBCV.py) shouldn't this have been np.shape(neighbors)[0] ?

Also based on the formula of Moulavi et al (definition 1, equation 3.1) Line 62 of your code shouldn't have been core_dist = (numerator / (n_neighbors -1 )) ** (-1/n_features) ?

joaopastor commented 5 years ago

I agree with you @avouros I did modify this to use a precomputed distance matrix and your tip helped a lot.

onofricamila commented 5 years ago

If np.shape(neighbors)[0] is taken instead of np.shape(neighbors)[1] (as it should be), the resultant index has always a low value (hardly never a positive one) ... even when evaluating good clustering results as the one obtained running hdbscan with the noisy moons dataset (provided by the author).

Does anyone know why?

christopherjenness commented 5 years ago

Fixed: https://github.com/christopherjenness/DBCV/commit/d874292eafc88b6f62f1294667c24ed3d780bf9c

pancodia commented 1 year ago

@onofricamila I have the same observation. Do you have any insights on this? I am wondering if something is wrong.