EvaluationResult of T-test (and W-test) do not contain the original percentile/alpha value

SCECcode / pycsep

Tools to help earthquake forecast model developers build and evaluate their forecasts

BSD 3-Clause "New" or "Revised" License

50 stars 24 forks source link

The t-test requires an alpha value to create a confidence interval (e.g., 5%) https://github.com/SCECcode/pycsep/blob/5f84ea97101de0439deb1e3f5c383874c7bb3801/csep/core/poisson_evaluations.py#L14-L15 from which information-gain bounds and 2-type error are return inside an EvaluationResult. However, this alpha value is then forgotten, which cause the EvaluationResult plotting to require recalling the original value of alpha with which the t-test was carried out. https://github.com/SCECcode/pycsep/blob/5f84ea97101de0439deb1e3f5c383874c7bb3801/csep/utils/plots.py#L1718

Not sure if creating a new attribute alpha of the resulting EvaluationResult https://github.com/SCECcode/pycsep/blob/5f84ea97101de0439deb1e3f5c383874c7bb3801/csep/core/poisson_evaluations.py#L46-L54

or to redefine the attributes of the t-test. For instance, shouldnt result.quantile, instead of result.test_distribution, contain actually the information_gain lower and upper bounds?

Also, the W-test confidence interval is calculated inside the plotting functions, instead of the evaluation function itself.

addressed in #263, commit 7306329a1b698e9aca8d6d91589e3d49ff525914, where added an extra value to EvaluationResult().quantile that stores the type1-error alpha value. Now, the alpha value can be written in the plotting legend to explain what the symbols/colors in the t-test plot mean.

Currently, the t-test EValuationResult() is defined as:

test_distribution = (lower_IG_bound, upper_IG_bound)
observation_statistic = Mean_IG (i.e., just the difference between the forecasts logscores)
quantiles = (T_statistic, T_critical, alpha (recently added))

but the values doesn't feel so in place. The dof value is also lost. Which would involve to do some crazy acrobatics if a different confidence interval is desired, or re-run the entire test. This is different with consistency test, where the confidence interval is defined at the Plot level.

I wonder if the attributes of the resulting EvaluationResult should be re-defined for the t-test as:

test_distribution: the actual t-distribution, with the 3 parameters of the location-scaled dist: e.g (meanIG, stdIG, dof). observation_statistic: 0, since we are testing if LogScores are substantially different, i.e., IG=0 quantile: % mass of test_distribution below 0.

In this way, the comparison test results are analogous to consistency test. A test_distribution, similar to Poisson/NegBinom distribution. A quantile value, that can be immediately looked if below (or above) of a confidence level.

Ideas? Or should we keep it like this?. @mherrmann3 @wsavran @bayonato89 @Serra314

SCECcode / pycsep

EvaluationResult of T-test (and W-test) do not contain the original percentile/alpha value #262