Sample CarbonPlan dataset query fails

Prukutu commented 1 month ago

Hi there!

I am trying to run the sample notebook to access the CMIP6 downscaled datasets found here: https://github.com/carbonplan/cmip6-downscaling/blob/main/notebooks/accessing_data_example.ipynb

Everything seems to work when I create the catalog instance and subsetting the data with the options:

cat_subset = cat.search(
    method="GARD-MV",
    source_id="MRI-ESM2-0",
    experiment_id="ssp245",
    variable_id="tasmax",
)

returns the right number of ensemble members (one for each time frequency). However, when I run

dsets = cat_subset.to_dataset_dict()

I get a series of warnings that culminate in an error:

--> The keys in the returned dictionary of datasets are constructed as follows: 'activity_id.institution_id.source_id.experiment_id.timescale.method' /home/lortizur/.local/lib/python3.10/site-packages/intake_esm/merge_util.py:270: RuntimeWarning: Failed to open Zarr store with consolidated metadata, falling back to try reading non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider:

Consolidating metadata in this existing store with zarr.consolidate_metadata().

Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or

Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata. ds = xr.open_zarr(path, **zarr_kwargs)

See below for the error message I get after the warning above. carbonplan_error.log

Not sure if the issue is on my end or if there's issues with the storage but thought I'd bring it up here in case someone has a solution!

Thanks

norlandrhagen commented 1 month ago

Hey @Prukutu, thanks for raising the issue. I believe it was an issue on our side. It should be fixed now.

Prukutu commented 1 month ago

Thanks for the update! The query from the sample notebook does seem to work now. However, I am running into similar errors when attempting to work with other queries. For example:

cat_subset = cat.search(
        variable_id="tasmax",
        experiment_id='ssp585',
        timescale='day')
ds_dict = cat_subset.to_dataset_dict(zarr_kwargs={'consolidated': True})

Gives me an FileNotFoundError for the file in: root: https://rice1.osn.mghpcc.org/carbonplan/cp-cmip/version1/rechunked_data/DeepSD/ScenarioMIP.MRI.MRI-ESM2-0.ssp585.r1i1p1f1.day.DeepSD.tasmax.zarr

Note: I added the 'consolidated' kwarg here because without it it fails and says that I should add it in the output.

norlandrhagen commented 1 month ago

@Prukutu Mind trying one more time? Fingers crossed it's actually updated now.

Prukutu commented 1 month ago

The query (is that the right word?) seems to be working now and I can at least see the DataArray shapes for my variable of interest (tasmax).

Thank you again @norlandrhagen!

norlandrhagen commented 1 month ago

Good to hear! LMK if you run into any other issues.

carbonplan / cmip6-downscaling

Sample CarbonPlan dataset query fails #334