Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

add get_partition_input #289

Closed juliettelavoie closed 8 months ago

juliettelavoie commented 10 months ago

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

no

Other information:

review-notebook-app[bot] commented 9 months ago

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

RondeauG commented 9 months ago
  • Now, I am wondering if it would be more xscen to also wrap the xclim fonction instead only prepping the data? This would allow to add a attrs processing_level= partition to be able to put it in a a catalog. workflow 1: xs.get_partition_input -> xc.ensembles.lafferty_sriver -> fg.partition workflow 2: xs.partition (which gets the input and wraps xclim) -> maybe save to disk and catalog -> fg.partition or other plots (maybe maps of one component) EDIT: I went with workflow 1

Other information:

  • I wasn't sure if this should be in extract.py (because we extract from a cat) or in ensembles.py ( to match the xclim module).

Only prepping the data is fine. It's what is done in build_reduction_data: https://github.com/Ouranosinc/xscen/blob/acb0f737058cb63f8d040a1fab8d34c042c6ea8b/xscen/reduce.py#L17

However, I think that it would be better to have the input be datasets: Union[dict, list[xr.Dataset]], rather than a DataCatalog? It can be dangerous to do the search call within the function and never show the results, it's better if it's done beforehand.

Since we might want to call this on raw data, I'd keep your other options (subset, regrid), though.

juliettelavoie commented 9 months ago

@RondeauG Interesting. It might even be worth renaming this function build_partition_data to keep a similar style.

juliettelavoie commented 9 months ago

@RondeauG How do you feel about passing a subcat already searched ? or you would really prefer a dict of ds ?

EDIT: Is it not obvious to me how to combine the datasets for an arbirtray number of partition_dim like with to_dataset(concat_on=partition_dim,create_ensemble_on=ensemble_on_list, EDIT2: ok I figured it out, I think

juliettelavoie commented 9 months ago

I don't understand what's going on. In commit, 2ece1d7 everything passed. I removed inconsequential lines andt the RTD build keeps failing. (I can run the notebook no problem on our servers). I have tried to rebuild the rtd a few times. In these different builds (without changing anything), the error is not always the same. Either the datasets doesn't load with a HDF5 error or it loads an empty dataset and fail when the function tries to subset ('Dataset' object has no attribute 'lon').

HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
    major: Invalid arguments to routine
    minor: Inappropriate type
ERROR:traitlets:Kernel died while waiting for execute reply.
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5A.c line 1891 in H5Aiterate2(): invalid location identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
  #001: H5VLint.c line 1741 in H5VL_vol_object(): invalid identifier type to function
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5D.c line 767 in H5Dget_create_plist(): invalid dataset identifier
    major: Invalid arguments to routine
    minor: Inappropriate type
HDF5-DIAG: Error detected in HDF5 (1.14.3) thread 0:
  #000: H5D.c line 481 in H5Dclose(): can't decrement count on dataset ID
    major: Dataset
    minor: Unable to decrement reference count
  #001: H5Iint.c line 1227 in H5I_dec_app_ref_always_close(): can't decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #002: H5Iint.c line 1197 in H5I__dec_app_ref_always_close(): can't decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #003: H5Iint.c line 946 in H5I_remove(): can't remove ID node
    major: Object ID
    minor: Can't delete message
  #004: H5Iint.c line 895 in H5I__remove_common(): can't remove ID node from hash table
    major: Object ID
    minor: Can't delete message
  #005: H5Iint.c line 1077 in H5I__dec_app_ref(): can't decrement ID ref count
    major: Object ID
    minor: Unable to decrement reference count
  #006: H5Iint.c line 981 in H5I__dec_ref(): can't locate ID
    major: Object ID
    minor: Unable to find ID information (already closed?)
juliettelavoie commented 8 months ago

LGTM! One thing that is not currently supported (but it's maybe not an issue) is regional models. If you want to delve into that, you could look at:

https://github.com/Ouranosinc/xscen/blob/ce54df8fa0a7bbe3a68572f153db65a6d74d6a57/xscen/ensembles.py#L374

The idea is to look at driving_model instead of source, and fill that entry with source in the case of global models.

I don't think this is necessary for now. I will do another PR if that changes.