Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

Matching `historical` and `sspXYZ` using `df` with incomplete columns make the historical period disappear #286

Closed coxipi closed 10 months ago

coxipi commented 10 months ago

Setup Information

Description

I parse a local directory to create a df. The following fields:

"activity"
'bias_adjust_institution'
'bias_adjust_project'

are unspecified, which causes the historical period to disappear when I proceed to make a catalog that matches hist and fut.

Steps To Reproduce

from xscen.catutils import parse_directory

df = parse_directory(
    directories=["/home/eridup1/tank/etiages/CMIP6_ornl_gov/"],
    patterns=[
        "{variable}_{?}_{source}_{experiment}_{member}_{?grid}_{DATES}.nc"
    ],
    homogenous_info={
        "mip_era": "CMIP6",
        "type": "simulation",
        "institution": "our",
        "processing_level": "extracted",
        "xrfreq": "MS",
        "frequency": "mon",
        "domain": "global",
        # I need to fill some or all of these fields, else historical datasets just disappear with match_hist_and_fut
        "activity": ".",
        'bias_adjust_institution':".",
        'bias_adjust_project':".",
    },
    read_from_file=["variable", "date_start", "date_end"],
)
subcat = xs.ProjectCatalog.from_df(df)
ds_dict = xs.search_data_catalogs(subcat,  variables_and_freqs={"rsds": "MS", "rsus":"MS"}, match_hist_and_fut=True)

The elements in ds_dict only contain the historical period when I fill the fields specified above. Otherwise, the historical period is ignored.

If I remove match_hist_and_fut=True, the historical periods remain, and they're separated. I checked and both sspXYZ and historical have [None,None,None] for the 3 specified fields, so it doesn't seem to be because of mismatching fields, but really the presence of None rather than some random string that makes the difference.

Additional context

No response

Contribution

juliettelavoie commented 10 months ago

The problem is the missing activity (https://github.com/Ouranosinc/xscen/blob/26387231ddce8ccfacd059d193f0c3de932f9ea8/xscen/extract.py#L1240). The code assumes that you are passing an official catalog with all the right columns (withhistorical in activity CMIP and sspXYZ, in ScenarioMIP).

Maybe we could throw a warning if necessary columns are not filled?

coxipi commented 10 months ago

I see. I had the impression that the "experiment" column would be used in this way. I think a warning would be good maybe yes, because I would characterize this as a silent failure otherwise. Then again, if my use case is just not the intended way to work with these tools, feel free to ignore this issue.

juliettelavoie commented 10 months ago

Both colums are used (activity and experiment). I can do a PR to add a warning. I don't think passing your own catalog is the "wrong" way to use this, even if it is not the typical case. Though, in general, I would encourage you to fill in as many columns as you can when creating your own catalogue https://xscen.readthedocs.io/en/latest/columns.html

aulemahal commented 10 months ago

I can't remember the reason we explicitly skip np.NaN as the activity. I guess HighResMIP is skipped because such an SSP wouldn't be compatible with a CMIP hist, but why skip NaN ?

juliettelavoie commented 10 months ago

I don't remember either...