Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

Config features and better `unstack_dates` #144

Closed aulemahal closed 1 year ago

aulemahal commented 1 year ago

Pull Request Checklist:

What kind of change does this PR introduce?

Rewrite of unstack_dates

Instead of using xarray,s unstack, I try here to perform a reshape operation. This is much more efficient with dask, but it did necessitate heavy changes.

Does this PR introduce a breaking change?

unstack_dates has lost some flexibility : data must pass xr.infer_freq. flox as a dependency. No more progress bar printed by extract_dataset for single datasets.

Other information:

I struggled to run the notebooks. But I'll try to do it soon, as a test for breaking changes.

aulemahal commented 1 year ago

@juliettelavoie For illustration. I take all QS-DEC indicators of ScenGen and construct an ensemble (dims: (experiment: 2, realization: 11, time: 605, lat: 320, lon: 416)). I then unstack the seasons and perform a spatial mean over 33 regions + percentiles over realizations.

The "stacked" dataset has 71508 tasks (len(ds.__dask_graph__().keys())).

Before: unstack_seasons took 1470 ms and added 118 560 tasks. The final compute took 12 min 19s.

After (this branch): unstack_seasons took 381 ms (0.25x) add added 24 730 tasks (0.21x). The final compute took 5 min 9s (0.42x).

All that said, I realize the code is more opaque... I can try to make it clearer, but I felt the performance boost was worth it.

aulemahal commented 1 year ago

Latest pushes fix my fix and all notebooks should run. As per produce_horizon, its use of unstack_dates should not be affected.