Make different similarity metrics comparable

pierretoussing commented 4 years ago

One reason why most use onyl Pearson's Correlation and Mutual Information is the lack of comparability of most similarity metrics.

Different similarity metrics have different value ranges
Some similarity metrics are "inverted" (i.e. Transfer entropy: Smaller value means more similar)
Different similarity metrics have different shapes (i.e. Mutual Information: A lot of small values and a few big values)

Comparable similiarity metrics would be used by climate scientists. Need some invariants like "Higher value → Higher similarity" to keep them comparable.

Another issue in making them comparable is to keep the information of the sign (i.e. Pearson's Correlation). Losing the information of the sign, exludes information of the phase.

pierretoussing commented 4 years ago

A first approach for (1., 3.) is binning the values into percentile bins (Smallest 10%, next 10%,..., biggest 10%). This brings all the value ranges to [0.1, 1] and eliminates the different shapes.

One downside is that bins have different sizes and information in the bins is lost. Another approach to try would be Histogram Equalization.

pierretoussing commented 4 years ago

A first approach for (2.) was simply take "- similarity metric" for metrics which are inverted. This ensures that bigger value equals bigger similarity.

pierretoussing commented 4 years ago

[ ] Create an automatic preprocessor that recognizes if a similarity metric should be inverted or not (based on known time series patterns)

Climate-Data-Science / Climate-Similarity-Metrics

Make different similarity metrics comparable #18