Find level of agreement between similarity measures

pierretoussing commented 4 years ago

Plot regions on a map where do different similarity measures agree that there is a dependency and where do they disagree on dependencies.

[x] Create a function that takes in a list of similarity measures and the data and plots a matrix of maps where do the similarity measures agree.
[x] Define different interpretations for "Agreement". Some examples would be: Entropy between values, Standard Deviation, ... Quantify how sure the model is about the dependencies.

pierretoussing commented 4 years ago

@pawelbielski For the agreement, I thought of something like a scoring function that outputs how sure we are, given a certain number of similarity measures, there is a dependency. This could work like this: The function takes in a list of similarity measures (for example we pass it 4 measures), then it computes the similarity to the reference series and then combines the 4 maps into one using a scoring function. One example for a scoring function would be "counting for how many similarity measures we have a similarity value in the upper 5 percents" and then it would output how many percent of the similarity measures say there is a dependency. If at one point we have a high value for all 4 similarity measures, it would output 100% and if only one of the 4 has a high value, then it would output 25%. After we have done this for every point, we can plot the results on a map. Of course, this scoring-criterion would be modular so we could test different possibilities.

What do you think of this idea?

pawelbielski commented 4 years ago

@pierretoussing I like your idea. It is definitely worth trying to plot such maps and present the to climate scientists. The question here is however, how to define the value of upper 5%. One solution could be do parameterize this, or even do it interactively. Also in this approach we are focusing on regions where there is a dependency (according to high values of measures).

You can also think of a more generic approach with no parameters at all: for example take the average value and use the entropy or variance as the measure of agreement (we have discussed it before).

pierretoussing commented 4 years ago

I already implemented my idea in Notebook 12, so you can have a look at the details. I am not sure I clearly expressed what I meant.

I will of course look at the other possibilities we talked about.

pierretoussing commented 4 years ago

As the current approach for plotting the agreeableness is not very clear, we decided to plot it in another way. Implement the following functions:

[x] Plot maps where the measures agree the values are high
1. Compute similarity with each measure
2. Combine the maps into one map containing the mean of the values
3. Combine the maps from 1 into one map containing the agreement of the values using an agreeableness measure (e.g. std, entropy,...)
4. Filter the maps using thresholds (one threshold for the mean values and one threshold for the agreement) and plot the remaining points in one color (e.g. blue)
5. Repeat step 4 with different thresholds
6. Plot all combinations of thresholds in a matrix
[x] Plot maps where the measures agree the values are low (same process as above)

pierretoussing commented 4 years ago

Peter was interested in the masks that were produced during the agreeableness computation. So it would be nice to create a possibility to extract this masks and then use them on other variables.

[x] Write a function for applying a mask on a map
[x] Modify the compute_agreement_defined_with so the mask will be returned in form of an array

Another point was the fact that for the value maps, the function always uses the mean. Give the user the possibility to user other functions like median, min,...

[x] Modify the agreement functions to make the use of own value summary functions possible

pierretoussing commented 4 years ago

@pawelbielski I found a bug: Absolute Pearson's was also scaled using the binning to quantiles function. But I think we should keep it that way when computing the agreeableness. When comparing similarity measures, it makes sense to let Pearon's untouched, but when combining (or computing the agreement) different similarity measures, they should have the same value distributions, so the mean and std are representative.

pawelbielski commented 4 years ago

@pierretoussing that is a great information. I see you have already updated the presentation slides: great!

Climate-Data-Science / Climate-Similarity-Metrics

Find level of agreement between similarity measures #20