dkirkby / zotbin

Sandbox for DESC tomo challenge
0 stars 0 forks source link

Implement a new similarity measure #1

Closed dkirkby closed 4 years ago

dkirkby commented 4 years ago

The redshift vector [1,0,0,0,0] is more "similar" to [0,1,0,0,0] than [0,0,0,0,1] but both have the same (zero) cosine similarity and (1/sqrt(2)) weighted similarity.

This issue is to implement a new measure that better corresponds to the intuition above.

One idea is to use the W1 ("earth moving") metric, which can be efficiently calculated for discrete probabilities, e.g. similarity = exp(-W1 / c) or exp(-(W1 / c)**2) ? (W1 is always >= 0).

dkirkby commented 4 years ago

The EMD distance is now implemented via the method="EDM" option to zotbin.group.groupbins, using exp(-W1) for the similarity measure. All 3 measures give comparable results after some minimal testing, at least with the normalizing flow preprocessor. The weighted measure is somewhat slower however. Closing.