Detection of significant evolution of an indicator

JanJereczek commented 1 year ago

evolution_metrics.jl:

indicator_evolution() computes indicator and its evolution for original timeseries and surrogates
chapter "Trend metrics" has been initialized with ridge_regression. Still to come: chapter with metrics for change-point detetction

significance.jl:

measure_significance() compares results from original timeseries and surrogates.
chapter "Percentile metrics" has been initialized with various measures of the significance defined wrt to percentiles.

JanJereczek commented 1 year ago

Thanks a lot for your review George! I am already noticing some benefits of our work for my own coding habits :) Most of what you say is crystal clear apart from two things:

Bullet point 2: Do you mean wrapping the following lines into a function?

wv_surr_indicator = WindowViewer(s, wv_indicator_width, wv_indicator_stride)
surr_x_indicator = map(indicator, wv_surr_indicator)
wv_surr_evolution = WindowViewer(surr_x_indicator, wv_evolution_width, wv_evolution_stride)
surr_evolution_ts = map(evolution_metric, wv_surr_evolution)

Bullet point 3: Do you mean generating the surrogates before the loop and basically saving them? I had thought of providing a vector of indicators (instead of just one as it is the case right now) and make the inner-most loop go over them. Maybe even provide a vector of evolution metrics if we don't want to use the same for all indicators. This way I wouldn't need to store the surrogates. Something like:

# Just an example of how these vectors would look like
indicators = [var, ar1]
evolution_metrics = [ridge_slope, corr_kendall]

# here some code is missing but we get the point
for i in 1:n_surro
    for k in 1:n_indicators
         s = sgen()
         surr_evolutions[i,:,k] = compute_indicator_evolution(s, indicators[k], evolution_metrics[k])
    end
end

Naming: I had the feeling that indicator_evolution() was a sensible name as we compute the evolution (either trend or change point - we need a generic name to describe both options) of an indicator (the call of map that you mention that gives the time-series but not the evolution metric yet!). I am open for suggestions though!

All the comments regarding code make perfect sense and I will embed them asap :)

Datseris commented 1 year ago

What's the case of using corr_kendall? The first case with the ridge is clear: we are computing the slope of e.g., AR1 coefficient. And we want to see a significant increase of slope. What is the correlation coefficient supposed to yield? And the correlation between what? The meta-analysis descriptor (what you call evolution metrics) are functions of 1 argument: a timeseries. What is the Kendall coefficient computing here?

Datseris commented 1 year ago

Bullet point 3: Do you mean generating the surrogates before the loop and basically saving them? I had thought of providing a vector of indicators (instead of just one as it is the case right now) and make the inner-most loop go over them. Maybe even provide a vector of evolution metrics if we don't want to use the same for all indicators. This way I wouldn't need to store the surrogates. Something like:

Yes, exactly like that.

Datseris commented 1 year ago

Bullet point 2: Do you mean wrapping the following lines into a function?

Well yes. The function doesn't care if the input is a surrogate or not; it does the same thing for both the real signal and the surrogate anyways. So you define a function once and call it twice: for the signal and the surrogate.

JanJereczek commented 1 year ago

In many publications, the kendall-tau correlation is used to measure the trend. It yields 1 for monotonously increasing time-series and -1 for monotonously decreasing one (and any value in-between if there is no monotonous behaviour). It is part of StatsBase.jl (https://juliastats.org/StatsBase.jl/stable/ranking/#StatsBase.corkendall) and for now I just used it as a placeholder. I wanted to illustrate that we might want to use the evolution metric n°1 to measure the trend of indicator A, and evolution metric n°2 to measure the trend of indicator B.

JanJereczek commented 1 year ago

Note that corrkendall(x, y=x) is by default a 1-argument function following that formula:

\tau=\frac{2}{n(n-1)} \sum \operatorname{sgn}\left(x_i-x_j\right) \operatorname{sgn}\left(y_i-y_j\right)

Datseris commented 1 year ago

okj sounds good. The "metaanalysis metric" (which is what you call "evolution metric") can be either a function or a vector of functions of equal length to the vector of indicators.

JanJereczek commented 1 year ago

Mostly done with the modifications. I feel it would be nice to have a struct defining the parameters of the meta-analysis. Something like:

struct MetaAnalysisParameters
    n_surrogates::Int
    surrogate_method::Surrogate
    rng::AbstractRNG
    wv_indicator_width::Int
    wv_indicator_stride::Int
    wv_evolution_width::Int
    wv_evolution_stride::Int
end

That way, we can pass these parameters easier from a function to an other. What do you think?

Datseris commented 1 year ago

Sounds good!

JanJereczek commented 1 year ago

Not completely done yet but not missing much! I will write here when it is ready for review :)

Datseris commented 1 year ago

ok!

JanJereczek commented 1 year ago

Ready for review! I think we should speak about naming in our next meeting. I already feel we are close to having a versatile interface :)

Datseris commented 1 year ago

I'm pushing some commits here to organize things a bit. Two general comments:

Do not declare types unless there is a reason to. Every method you've written is cluttered with ::AbstractVector{T}. Why is it important that x and y need to be the same type?
The docs need some love. At the moment several docstrings are wrong or do not write all arguments correctly.
There are still way more functions than necessary. I'll try to resolve this, have a look at the commits I will be pushing

Datseris commented 1 year ago

Actually, Jan, forget what I said. The code is fine the way it is now. Here is what I propose: make sure tests pass. Then, I merge and we start working on the documentation. The documentation will allow us to see what things are overengineered and which are not.

JanJereczek commented 1 year ago

Hi George,

Thanks for your corrections! Some remarks:

I kept the precomputed version of computing the ridge slope and put a small benchmark experiment for you to check that it is way faster (test/metaanalysis_trend.jl)
We seem to converge on following naming: we compute indicators and study their evolution metrics, which results in a meta-analysis of the indicators.
Regarding the strong typing I require: I agree with you that it is not necessary but I felt like it was not requiring a lot and would potentially improve performance. But I guess we can just be less restrictive in that regard.
I will make a small example on a time-series displaying a transition vs. one time-series without transition. This will be neat for the docs and will make sure the docstrings remain cleaner!

After merge into main I see following things to do next:

A thresholding function shooting an alarm if n_min_indicators or more show a significant meta-analysis.
Implement some more indicators.

What do you think?

Datseris commented 1 year ago

Highest priority now is the documentation, so that you can explain how you envision the package to be used. Then, we follow with your two suggestions.

JuliaDynamics / TransitionsInTimeseries.jl

Detection of significant evolution of an indicator #16