Climate-Data-Science / Climate-Similarity-Metrics

Which similarity metrics are the most helpful to understand climate
0 stars 2 forks source link

Find level of agreement between similarity measures #20

Open pierretoussing opened 4 years ago

pierretoussing commented 4 years ago

Plot regions on a map where do different similarity measures agree that there is a dependency and where do they disagree on dependencies.

pierretoussing commented 4 years ago

@pawelbielski For the agreement, I thought of something like a scoring function that outputs how sure we are, given a certain number of similarity measures, there is a dependency. This could work like this: The function takes in a list of similarity measures (for example we pass it 4 measures), then it computes the similarity to the reference series and then combines the 4 maps into one using a scoring function. One example for a scoring function would be "counting for how many similarity measures we have a similarity value in the upper 5 percents" and then it would output how many percent of the similarity measures say there is a dependency. If at one point we have a high value for all 4 similarity measures, it would output 100% and if only one of the 4 has a high value, then it would output 25%. After we have done this for every point, we can plot the results on a map. Of course, this scoring-criterion would be modular so we could test different possibilities.

What do you think of this idea?

pawelbielski commented 4 years ago

@pierretoussing I like your idea. It is definitely worth trying to plot such maps and present the to climate scientists. The question here is however, how to define the value of upper 5%. One solution could be do parameterize this, or even do it interactively. Also in this approach we are focusing on regions where there is a dependency (according to high values of measures).

You can also think of a more generic approach with no parameters at all: for example take the average value and use the entropy or variance as the measure of agreement (we have discussed it before).

pierretoussing commented 4 years ago

I already implemented my idea in Notebook 12, so you can have a look at the details. I am not sure I clearly expressed what I meant.

I will of course look at the other possibilities we talked about.

pierretoussing commented 4 years ago

As the current approach for plotting the agreeableness is not very clear, we decided to plot it in another way. Implement the following functions:

pierretoussing commented 4 years ago

Peter was interested in the masks that were produced during the agreeableness computation. So it would be nice to create a possibility to extract this masks and then use them on other variables.

Another point was the fact that for the value maps, the function always uses the mean. Give the user the possibility to user other functions like median, min,...

pierretoussing commented 4 years ago

@pawelbielski I found a bug: Absolute Pearson's was also scaled using the binning to quantiles function. But I think we should keep it that way when computing the agreeableness. When comparing similarity measures, it makes sense to let Pearon's untouched, but when combining (or computing the agreement) different similarity measures, they should have the same value distributions, so the mean and std are representative.

pawelbielski commented 4 years ago

@pierretoussing that is a great information. I see you have already updated the presentation slides: great!