gagolews / genieclust

Genie: Fast and Robust Hierarchical Clustering with Noise Point Detection - in Python and R
https://genieclust.gagolewski.com
Other
58 stars 10 forks source link

new affinities for exact=True (for both R and Python) #9

Open gagolews opened 4 years ago

gagolews commented 4 years ago

from https://github.com/gagolews/genie/blob/master/src/hclust2_distance.cpp

gagolews commented 4 years ago

add pytests

gagolews commented 4 years ago

Add support for other scipy.spatial distances when computing an exact MST, in particular, the weighted Euclidean metric.

gagolews commented 4 years ago

from R's dist:

 ‘canberra’:

          sum(|x_i - y_i| / (|x_i| + |y_i|)).  Terms with zero
          numerator and denominator are omitted from the sum and
          treated as if the values were missing.

          This is intended for non-negative values (e.g., counts), in
          which case the denominator can be written in various
          equivalent ways; Originally, R used x_i + y_i, then from 1998
          to 2017, |x_i + y_i|, and then the correct |x_i| + |y_i|.

     ‘maximum’: Maximum distance between two components of x and y
          (supremum norm)
gagolews commented 4 years ago

see also the distances supported by nmslib