Closed juliettelavoie closed 8 months ago
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
- Now, I am wondering if it would be more xscen to also wrap the xclim fonction instead only prepping the data? This would allow to add a attrs
processing_level= partition
to be able to put it in a a catalog. workflow 1:xs.get_partition_input
->xc.ensembles.lafferty_sriver
->fg.partition
workflow 2:xs.partition
(which gets the input and wraps xclim) -> maybe save to disk and catalog ->fg.partition
or other plots (maybe maps of one component) EDIT: I went with workflow 1Other information:
- I wasn't sure if this should be in
extract.py
(because we extract from a cat) or inensembles.py
( to match the xclim module).
Only prepping the data is fine. It's what is done in build_reduction_data
: https://github.com/Ouranosinc/xscen/blob/acb0f737058cb63f8d040a1fab8d34c042c6ea8b/xscen/reduce.py#L17
However, I think that it would be better to have the input be datasets: Union[dict, list[xr.Dataset]]
, rather than a DataCatalog? It can be dangerous to do the search
call within the function and never show the results, it's better if it's done beforehand.
Since we might want to call this on raw data, I'd keep your other options (subset, regrid), though.
@RondeauG Interesting. It might even be worth renaming this function build_partition_data
to keep a similar style.
@RondeauG How do you feel about passing a subcat already searched ? or you would really prefer a dict of ds ?
EDIT: Is it not obvious to me how to combine the datasets for an arbirtray number of partition_dim
like with to_dataset(concat_on=partition_dim,create_ensemble_on=ensemble_on_list,
EDIT2: ok I figured it out, I think
I don't understand what's going on. In commit, 2ece1d7 everything passed. I removed inconsequential lines andt the RTD build keeps failing. (I can run the notebook no problem on our servers). I have tried to rebuild the rtd a few times. In these different builds (without changing anything), the error is not always the same. Either the datasets doesn't load with a HDF5 error or it loads an empty dataset and fail when the function tries to subset ('Dataset' object has no attribute 'lon'
).
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
#000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
major: Invalid arguments to routine
minor: Inappropriate type
#001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
major: Invalid arguments to routine
minor: Inappropriate type
ERROR:traitlets:Kernel died while waiting for execute reply.
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
#000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
major: Invalid arguments to routine
minor: Inappropriate type
#001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
#000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
major: Invalid arguments to routine
minor: Inappropriate type
#001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
#000: H5D.c line 767 in H5Dget_create_plist(): invalid dataset identifier
major: Invalid arguments to routine
minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
#000: H5D.c line 481 in H5Dclose(): can't decrement count on dataset ID
major: Dataset
minor: Unable to decrement reference count
#001: H5Iint.c line 1227 in H5I_dec_app_ref_always_close(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#002: H5Iint.c line 1197 in H5I__dec_app_ref_always_close(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#003: H5Iint.c line 946 in H5I_remove(): can't remove ID node
major: Object ID
minor: Can't delete message
#004: H5Iint.c line 895 in H5I__remove_common(): can't remove ID node from hash table
major: Object ID
minor: Can't delete message
#005: H5Iint.c line 1077 in H5I__dec_app_ref(): can't decrement ID ref count
major: Object ID
minor: Unable to decrement reference count
#006: H5Iint.c line 981 in H5I__dec_ref(): can't locate ID
major: Object ID
minor: Unable to find ID information (already closed?)
LGTM! One thing that is not currently supported (but it's maybe not an issue) is regional models. If you want to delve into that, you could look at:
The idea is to look at
driving_model
instead ofsource
, and fill that entry withsource
in the case of global models.
I don't think this is necessary for now. I will do another PR if that changes.
Pull Request Checklist:
number
) and pull request (:pull:number
) has been added.What kind of change does this PR introduce?
xs.ensembles.get_partition_input
to prepare data from catalog to be passed to xclim partionning functions (https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)processing_level= partition
to be able to put it in a a catalog. workflow 1:xs.get_partition_input
->xc.ensembles.lafferty_sriver
->fg.partition
workflow 2:xs.partition
(which gets the input and wraps xclim) -> maybe save to disk and catalog ->fg.partition
or other plots (maybe maps of one component) EDIT: I went with workflow 1Does this PR introduce a breaking change?
no
Other information:
extract.py
(because we extract from a cat) or inensembles.py
( to match the xclim module).