JuliaDynamics / TransitionsInTimeseries.jl

Transition Indicators / Early Warning Signals / Regime Shifts / Change Point Detection
MIT License
18 stars 5 forks source link

Reducing the memory allocation of the significance testing #54

Closed JanJereczek closed 10 months ago

JanJereczek commented 10 months ago

In sliding_surrogates_loop! and segmented_surrogates_loop!, many allocations were occurring so far. This has been reduced by introducing three new functions init_pvals, accumulate_pvals! and choose_pval!. With them, we now always create pval_right and pval_left, which allows to go from 5.178 ms (1954 allocations: 3.03 MiB) down to 4.989 ms (738 allocations: 2.96 MiB) - c.f. example that has been appended to benchmark/benchmarks.jl. These new functions also allow an easier maintenance of sliding_surrogates_loop! and segmented_surrogates_loop!.

Although there is an improvement in the number of allocations, the performance is not significantly improved. I find it a bit challenging to understand if we can cut the memory allocation and the runt-time even more. @Datseris, your expertise is mostly welcome on that matter!

codecov-commenter commented 10 months ago

Welcome to Codecov :tada:

Once merged to your default branch, Codecov will compare your coverage reports and display the results in this comment.

Thanks for integrating Codecov - We've got you covered :open_umbrella:

Datseris commented 10 months ago

@JanJereczek , following your PR #54 . Here are some tips. The pvalues are by far the least meaningful part of the performance. What you have to do is:

First, separate code into more functions by doing

Another point to improve is that our functions are not written in a way that allows for easy optimizations. Normally you would want to have a setup function that initializes all containers, and then a function that does the compitations without initializing anything.

(which you did a bit here)

but more importantly:

We can start by detecting the bottlenecks. Using the profiling functionality of VSCode we can get a flame graph of what parts of the code are the slowest. Then, we can benchmark the slow functions in separation and see what goes wrong. Do we have type instabilities? Do we allocate unecessary vectors?

Really, this is the most crucial part. Do proper profiling. Find out what parts of the code take the most time. Calculating the pvalues is trivial and takes 0 time relative to the whole code, which is why this PR doesn't bring any performance changes. You have optimzied the 0% of the performance. That's why you first need to invest in finding what needs to be optimized, and then optimize it.

JanJereczek commented 10 months ago

Hi George,

The function init_pvals() has been removed and the PR should be ready to merge.

I agree with you that performance needs profiling and that's what I did. By doing so, I noticed that there were too many memory allocations occurring in the computation of the p-value and this PR aims to be (atomically fixing this).

Besides this, I will dig further into where we can cut run-time. I guess some performance issues will be fixed by solving the surrogate bug mentioned in #25.

Datseris commented 10 months ago

Ok. next time we have a call you can show me the profiling info you have gathered!