add get_partition_input

juliettelavoie commented 10 months ago

Pull Request Checklist:

[ ] This PR addresses an already opened issue (for bug fixes / features)
- This PR fixes #xyz
[x] (If applicable) Documentation has been added / updated (for bug fixes / features).
[x] (If applicable) Tests have been added.
[x] This PR does not seem to break the templates.
[x] HISTORY.rst has been updated (with summary of main changes).
- [x] Link to issue (:issue:number) and pull request (:pull:number) has been added.

What kind of change does this PR introduce?

Add xs.ensembles.get_partition_input to prepare data from catalog to be passed to xclim partionning functions (https://xclim.readthedocs.io/en/stable/api.html#uncertainty-partitioning)
Now, I am wondering if it would be more xscen to also wrap the xclim fonction instead only prepping the data? This would allow to add a attrs processing_level= partition to be able to put it in a a catalog. workflow 1: xs.get_partition_input -> xc.ensembles.lafferty_sriver -> fg.partition workflow 2: xs.partition (which gets the input and wraps xclim) -> maybe save to disk and catalog -> fg.partition or other plots (maybe maps of one component) EDIT: I went with workflow 1

Does this PR introduce a breaking change?

no

Other information:

I wasn't sure if this should be in extract.py (because we extract from a cat) or in ensembles.py ( to match the xclim module).
companion de https://github.com/Ouranosinc/xclim/pull/1529 et https://github.com/Ouranosinc/figanos/pull/134

review-notebook-app[bot] commented 9 months ago

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

RondeauG commented 9 months ago

Now, I am wondering if it would be more xscen to also wrap the xclim fonction instead only prepping the data? This would allow to add a attrs processing_level= partition to be able to put it in a a catalog. workflow 1: xs.get_partition_input -> xc.ensembles.lafferty_sriver -> fg.partition workflow 2: xs.partition (which gets the input and wraps xclim) -> maybe save to disk and catalog -> fg.partition or other plots (maybe maps of one component) EDIT: I went with workflow 1

Other information:

I wasn't sure if this should be in extract.py (because we extract from a cat) or in ensembles.py ( to match the xclim module).

Only prepping the data is fine. It's what is done in build_reduction_data: https://github.com/Ouranosinc/xscen/blob/acb0f737058cb63f8d040a1fab8d34c042c6ea8b/xscen/reduce.py#L17

However, I think that it would be better to have the input be datasets: Union[dict, list[xr.Dataset]], rather than a DataCatalog? It can be dangerous to do the search call within the function and never show the results, it's better if it's done beforehand.

Since we might want to call this on raw data, I'd keep your other options (subset, regrid), though.

juliettelavoie commented 9 months ago

@RondeauG Interesting. It might even be worth renaming this function build_partition_data to keep a similar style.

juliettelavoie commented 9 months ago

@RondeauG How do you feel about passing a subcat already searched ? or you would really prefer a dict of ds ?

EDIT: Is it not obvious to me how to combine the datasets for an arbirtray number of partition_dim like with to_dataset(concat_on=partition_dim,create_ensemble_on=ensemble_on_list, EDIT2: ok I figured it out, I think

juliettelavoie commented 9 months ago

I don't understand what's going on. In commit, 2ece1d7 everything passed. I removed inconsequential lines andt the RTD build keeps failing. (I can run the notebook no problem on our servers). I have tried to rebuild the rtd a few times. In these different builds (without changing anything), the error is not always the same. Either the datasets doesn't load with a HDF5 error or it loads an empty dataset and fail when the function tries to subset ('Dataset' object has no attribute 'lon').

HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
    major: Invalid arguments to routine
    minor: Inappropriate type
ERROR:traitlets:Kernel died while waiting for execute reply.

HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5D.c line 767 in H5Dget_create_plist(): invalid dataset identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5D.c line 481 in H5Dclose(): can't decrement count on dataset ID
    major: Dataset
    minor: Unable to decrement reference count
  #001: H5Iint.c line 1227 in H5I_dec_app_ref_always_close(): can't decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #002: H5Iint.c line 1197 in H5I__dec_app_ref_always_close(): can't decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #003: H5Iint.c line 946 in H5I_remove(): can't remove ID node
    major: Object ID
    minor: Can't delete message
  #004: H5Iint.c line 895 in H5I__remove_common(): can't remove ID node from hash table
    major: Object ID
    minor: Can't delete message
  #005: H5Iint.c line 1077 in H5I__dec_app_ref(): can't decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #006: H5Iint.c line 981 in H5I__dec_ref(): can't locate ID
    major: Object ID
    minor: Unable to find ID information (already closed?)

juliettelavoie commented 8 months ago

LGTM! One thing that is not currently supported (but it's maybe not an issue) is regional models. If you want to delve into that, you could look at:

https://github.com/Ouranosinc/xscen/blob/ce54df8fa0a7bbe3a68572f153db65a6d74d6a57/xscen/ensembles.py#L374

The idea is to look at driving_model instead of source, and fill that entry with source in the case of global models.

I don't think this is necessary for now. I will do another PR if that changes.

Ouranosinc / xscen