Closed aulemahal closed 1 year ago
@juliettelavoie For illustration. I take all QS-DEC indicators of ScenGen and construct an ensemble (dims: (experiment: 2, realization: 11, time: 605, lat: 320, lon: 416)
). I then unstack the seasons and perform a spatial mean over 33 regions + percentiles over realizations.
The "stacked" dataset has 71508 tasks (len(ds.__dask_graph__().keys())
).
Before:
unstack_seasons
took 1470 ms and added 118 560 tasks. The final compute took 12 min 19s.
After (this branch):
unstack_seasons
took 381 ms (0.25x) add added 24 730 tasks (0.21x). The final compute took 5 min 9s (0.42x).
All that said, I realize the code is more opaque... I can try to make it clearer, but I felt the performance boost was worth it.
Latest pushes fix my fix and all notebooks should run. As per produce_horizon
, its use of unstack_dates
should not be affected.
Pull Request Checklist:
number
) and pull request (:pull:number
) has been addedWhat kind of change does this PR introduce?
CONFIG.set
method. Each time an item ofCONFIG
is accessed, a copy is returned, in order to avoid errors. This adds back the possibility to edit the configuration programmatically.load_config
so it acceptskey=value
entries. This is meant to make it easier to parse from the CLI.extract_dataset
, the changed line is supposed to be equivalent to the old one, given how those variables are set above, but it also avoids a bug when avar_name
is not given invariables_and_freq
.to_dataset_dict
only prints a progress bar if there are more than 1 dataset. Avoids useless output in trivial cases.Rewrite of
unstack_dates
Instead of using xarray,s unstack, I try here to perform a
reshape
operation. This is much more efficient with dask, but it did necessitate heavy changes.xr.infer_freq(ds)
fails. I assumed this is almost never the case for the use we make of this function.seasons
argument is now a dictionary with the month number as keys, not a "%m-%d" date. The function currently only works for coarser-than-month frequencoes, so this simply made the code easier to write. Also, even if we expand to acceptdaily
freqs, theseasons
arg wouldn't have any meaning I think (the new dim would be dayofyear anyway, no ?).reshape
on dask arrays, the chunks are realigned usingflox
beforehand, ensuring the chunks frontiers fall on yearly timesteps.Does this PR introduce a breaking change?
unstack_dates
has lost some flexibility : data must passxr.infer_freq
.flox
as a dependency. No more progress bar printed byextract_dataset
for single datasets.Other information:
I struggled to run the notebooks. But I'll try to do it soon, as a test for breaking changes.