Closed dkirkby closed 4 years ago
The EMD distance is now implemented via the method="EDM"
option to zotbin.group.groupbins
, using exp(-W1) for the similarity measure. All 3 measures give comparable results after some minimal testing, at least with the normalizing flow preprocessor. The weighted measure is somewhat slower however. Closing.
The redshift vector [1,0,0,0,0] is more "similar" to [0,1,0,0,0] than [0,0,0,0,1] but both have the same (zero) cosine similarity and (1/sqrt(2)) weighted similarity.
This issue is to implement a new measure that better corresponds to the intuition above.
One idea is to use the W1 ("earth moving") metric, which can be efficiently calculated for discrete probabilities, e.g. similarity = exp(-W1 / c) or exp(-(W1 / c)**2) ? (W1 is always >= 0).