Benchmarking Growing Season Length

aulemahal commented 4 years ago

The current computation of "growing season length" in xclim uses enormous amounts of memory and usually fails with large datasets. I tested some other method to compute the thing and results are good, but less incredible than the last two benchmarks made this way.

Two methods:

The current implementation with small changes.
Using xc.run_length.first_run calls.

For the second case, I tested a lot of different versions, to try and pinpoint what was responsible for the memory consumption. The best way, is exp_firstruncheck.

Graphs: 1) Small chunks (50x50) and many years (99). growing_season_99years_50x50_notimechunks

2) Large chunks (200x200) and fewer years (50). growing_season_50years_200x200_notimechunks

Conclusion is that the default version with small tweaks can be sped up and made to take less memory. But, the method with first_run while being slower, consumes a lot less memory and does so more stabily.

I yet have to test with data that has chunks smaller than a year. More to come.

aulemahal commented 4 years ago

@tlogan2000 With firstruncheck, I am currently computing the growing season length of a full generic scenario and the memory is stable at 10-12 GB (9-10% of doris) and I estimate a computation time of 25-30 min.

tlogan2000 commented 4 years ago

Great. 25 minutes seems a bit long? What is the calc time for a 'normal' indicator. In any case at least we have a version that is memory stable

sbiner commented 4 years ago

J'aimerais savoir comment tu as obtenu les graphiques de la consommation de mémoire. Tu as "scaper" un top ou il y a qq'chose qui permet de faire ça? J'ai déjà cherché qq'chose du genre mais sans succès jusqu'à présent ...

On Fri, Jan 17, 2020 at 9:35 AM Travis Logan notifications@github.com wrote:

Great. 25 minutes seems a bit long? What is the calc time for a 'normal' indicator. In any case at least we have a version that is memory stable

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Ouranosinc/xclim-benchmark/pull/3?email_source=notifications&email_token=AD7CIHFSE7IVTXAAWYSHMCLQ6G64DA5CNFSM4KH3O5KKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJH3XWQ#issuecomment-575650778, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7CIHB7VCGPIVL4UWC3IGDQ6G64DANCNFSM4KH3O5KA .

--

Sébastien Biner

aulemahal commented 4 years ago

@sbiner J'utilise memory_profiler. Assez cool! Je lance mon script pour chaque "expérience" avec:

>>> mprof run -C bench_gsl.py exp

Le -C c'est pour suivre tous les fils et processus ("children"), nécessaire avec dask. Ça peut aussi faire des figures directement, mais j'ai préféré écrire mon propre code dans le script.

sbiner commented 4 years ago

merci

On Fri, Jan 17, 2020 at 10:19 AM Pascal Bourgault notifications@github.com wrote:

J'utilise memory_profiler https://pypi.org/project/memory-profiler/. Assez cool! Je lance mon script pour chaque "expérience" avec:

mprof run -C bench_gsl.py exp

Le -C c'est pour suivre tous les fils et processus ("children"), nécessaire avec dask. Ça peut aussi faire des figures directement, mais j'ai préféré écrire mon propre code dans le script.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/Ouranosinc/xclim-benchmark/pull/3?email_source=notifications&email_token=AD7CIHCUFAKE7F7K3YX72JLQ6HD6VA5CNFSM4KH3O5KKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJH774Y#issuecomment-575668211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7CIHFT2Q4X2ZDK2ZPTUADQ6HD6VANCNFSM4KH3O5KA .

--

Sébastien Biner

aulemahal commented 4 years ago

@tlogan2000 According to my Portraits Climatiques update, 25 minutes seems normal for a 2-variable indicator (tas is made from tasmin and tasmax).

Ouranosinc / xclim-benchmark

Benchmarking Growing Season Length #3

--

Sébastien Biner

--

Sébastien Biner