JuliaDynamics / TransitionsInTimeseries.jl

Transition Indicators / Early Warning Signals / Regime Shifts / Change Point Detection
MIT License
18 stars 5 forks source link

Swap loops and improve segmented analysis #57

Closed JanJereczek closed 9 months ago

JanJereczek commented 10 months ago

Hi @Datseris,

This PR swaps the indicator and surrogate loops in significant_transitions(). This prevents us from generating more surrogates than needed, as discussed in #25. The computation time and memory allocation are improved and quantified in examples that you will find in benchmarks/swapped_loops.jl. Spoiler:

@btime significant_transitions($results, $signif)
#=
New code = swaploops-indicators-surrogates.
Old code = main branch (2023/10/30).

New code: 2.103 s (60,345 allocations: 1.12 GiB)
Old code: 2.413 s (80,415 allocations: 2.24 GiB)
=#

This is for the sliding analysis. For the segmented one:

@btime significant_transitions($results, $signif)
#=
New code = swaploops-indicators-surrogates.
Old code = main branch (2023/10/30).

New code: 3.024 s (202,536 allocations: 956.14 MiB)
Old code: 3.881 s (403,763 allocations: 1.87 GiB)
=#

See code for more details!

Swapping the loops makes the previously defined sliding_surrogates_loop!() and sliding_surrogates_loop!() obsolete. To ease the maintenance and readability of significant_transitions(), I introduced sanitycheck_tail() and choose_metrics(), and further dispatched accumulate_pval!() and choose_pval!() (not happy super happy with how I did it, but did not know how to do it better).

Furthermore, I took advantage to fix the suboptimal computation time of the segmented analysis by:

  1. making better preallocations before the loops.
  2. fetching the indices corresponding to the beginning and end of the segments with segment_time() instead of the previously used (and suboptimal) segment(). The indices are passed to SegmentedWindowResults to avoid re-computing these indices in significant_transitions().
  3. precomputing the change metrics before the surrogate significance and storing them in SegmentedWindowResults. Note: this step is a bit tricky in a segmented analysis compared to a sliding one, since a precomputation is needed for each segment.
  4. using views.

Note that, so far, n surrogates are generated for each segment. We were disagreeing a bit about this #25. Let's talk about it live and adapt if needed.

Finally, something for future PRs. I believe that precomputing all metrics (indicator and change) should be done in estimate_indicator_changes(). In fact, this depends on the time series that is passed, which is not done when defining the config.

I hope it's clear!