Find similarity metrics that could be useful for climate data

Climate-Data-Science / Climate-Similarity-Metrics

Which similarity metrics are the most helpful to understand climate

0 stars 2 forks source link

Find similarity metrics that could be useful for climate data #5

Closed pawelbielski closed 4 years ago

pawelbielski commented 4 years ago

Apart from Pearson's correlation and Mutual Information, there exist other similarity metrics that could be useful in our case. You can take the Nature paper that you already read as a starting point. The authors present a list of potentially useful similarity metrics there.

You may use the Google Scholar for finding more similarity metrics.

Steps:

[x] Perform 1st pass of possibly many papers (you may start from the nature paper)
[x] Perform 2nd pass for the most relevant papers.
[x] Check if a python implementation is available.
[x] Check if python libraries implementing a concrete simialarity metric do not have any other similarity metrics in the same package.
[x] Write a summary of all useful similarity metrics that we could use for this project, together with the information about available code implementation.

pawelbielski commented 4 years ago

Maybe you could also take a look at Monte Carlo Dependency Estimation, created by Edouard Fouche from our chair recently. It also is able to detect nonlinear dependencies between time series, and does improve some of the Mutual Information's shortcomings.

pawelbielski commented 4 years ago

Great work so far! @pierretoussing Keep reading and summarizing papers that you find relevant. Starting with the papers that you marked as References I would like to follow from your previous summaries might be a good starting point. You can also use the google scholar approach described above. At some point, after our meeting with climate scientists we will discuss ideas from the Data Science/ Computer Science/ IPD Boehm Chair perspective. Your summaries will help find the right story for your Thesis before presenting the proposal.

pierretoussing commented 4 years ago

@pawelbielski I went through the literature and maybe discovered another interesting approach: Converting the time series to symbol sequences using SAX (Symbolic Aggregate Approximation) and then applying a sequence similarity metric like Levenshtein distance, Hamming distance, biggest common subsequence,...

pawelbielski commented 4 years ago

@pierretoussing Symbolic similarity metrics indeed are worth trying out (but not as the first priority).

In order to find more meaningful metrics you can try the Snowballing approach of finding related papers.

pawelbielski commented 4 years ago

In "Related Work" of the PhD Thesis of our former colleague from Institute you can find some interesting state of the art dependency/similarity metrics.

pierretoussing commented 4 years ago

This issue will be continued in #17