Change example correlation diagnostic so it uses a more meaningful statistic

bouweandela commented 6 years ago

So we do not forget the discussion in #596, this code https://github.com/ESMValGroup/ESMValTool/blob/60a89f7828025c599615bcb5932b1917a40fb333/esmvaltool/diag_scripts/examples/correlate.py#L48

should probably be updated so it uses either: scipy.stats.mstats.ks_twosamp scipy.stats.ks_2samp or this: scipy.stats.anderson_ksamp as some people seem a bit critical about the KS test.

valeriupredoi commented 6 years ago

assigned myself on this one, my intention is to start looking into developing a serious statistical module for ESMValTool, this is a good starting point

RCHG commented 6 years ago

Regarding this issue/enhancement here there is information that I hope will be useful.

It might be interesting to check the R-Forge libraries, for instance, those related with the Wilcox robust statistics functions (https://rdrr.io/rforge/WRS/man/) or those in robustbase (https://rdrr.io/rforge/robustbase/man/). Some of them are already implemented on scipy but actually not all. It is useful to have in mind the package rpy2 for reuse or double checking.
In general the Pearson cross-correlation is not robust and assumes similar properties on the joint-distribution than the linear-regression. However, there are slight improvements that could solve at least the outlier dependency: like the percentage bend correlation coefficient (https://link.springer.com/article/10.1007/BF02294395) or Winsorized-correlation (that only relies on the trimmed mean and trimmed var ).
About the ksamples methods like those above mentioned, Anderson-Darling, Kruskal-Wallis etc, the ksamples package has information but it needs to know something about rank based tests. Other possibilities are rank correlation measures.

bouweandela commented 6 years ago

We will also support R diagnostics in the near future, see https://github.com/ESMValGroup/ESMValTool/pull/631, so no need to use rpy2.

bouweandela commented 5 months ago

Feel free to re-open if anyone has plans to do this.

ESMValGroup / ESMValTool

Change example correlation diagnostic so it uses a more meaningful statistic #625