cedadev / cmip6-object-store

CMIP6 Object Store Library
BSD 3-Clause "New" or "Revised" License
4 stars 4 forks source link

Update the datasets list to add more CMIP6 datasets to Caringo #42

Open agstephens opened 3 years ago

agstephens commented 3 years ago

@RuthPetrie, I'd like to add some more data to the object-store Zarr holdings for CMIP6.

MattMiz recommended using the top-20 (non-ocean) variables from:

http://esgf-ui.cmcc.it/esgf-dashboard-ui/cmip6.html

If we do that we can download a CSV file, and then plug those into your script for querying CREPP. Does that sound reasonable to you?

agstephens commented 3 years ago

We now have an updated list:

https://github.com/cedadev/cmip6-object-store/blob/master/catalogs/cmip6-datasets_2020-10-27.csv

Let's use that @agstephens

RuthPetrie commented 3 years ago

I previously prepared a 200TB dataset list but it was never used did you want that one? I have added it via PR.

agstephens commented 3 years ago

Thanks @RuthPetrie

agstephens commented 3 years ago

@alaniwi: an update has been made to the CSV file that lists the input datasets that we should use when creating the Zarr files.

The total volume is now ~198TB so there should be plenty of conversion work to be done.

Please add this in as the source of the batches:

https://github.com/cedadev/cmip6-object-store/blob/master/catalogs/cmip6-datasets_2020-10-27.csv

You will need to:

alaniwi commented 3 years ago

Issues.

sci3 is not talking to /badc/cmip6 at all. ls /badc/cmip6 just hangs indefinitely. What is written below was done on sci6.

I do not still have pickle files that indicate that thousands of datasets have already been done. The ones I have do not have that many entries.

If I try to run the first batch, then it does indeed attempt to convert the files. It claims that some of these succeed, for example:

Completed write for: CMIP6.DCPP.IPSL.IPSL-CM6A-LR/dcppC-ipv-pos.r1i1p1f1.Amon.huss.gr.v20190110.zarr

and that some of them fail, for example (paths redacted here):

ERROR:/path/to/cmip6-object-store/cmip6_object_store/cmip6_zarr/zarr_writer.py:FAILED TO COMPLETE FOR: CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.esm-piControl-spinup.r1i1p1f2.Amon.va.gr.v20181018
Failed to get Xarray dataset: CMIP6.CMIP.CNRM-CERFACS.CNRM-ESM2-1.esm-piControl-spinup.r1i1p1f2.Amon.va.gr.v20181018:
Traceback (most recent call last):
  File "/path/to/cmip6-object-store/cmip6_object_store/cmip6_zarr/zarr_writer.py", line 57, in convert
    ds = self._get_ds(dataset_id)
  File "/path/to/cmip6-object-store/cmip6_object_store/cmip6_zarr/zarr_writer.py", line 92, in _get_ds
    ds = xr.open_mfdataset(file_pattern, use_cftime=True, combine="by_coords")
  File "/path/to/cmip6-object-store/venv/lib/python3.7/site-packages/xarray/backends/api.py", line 915, in open_mfdataset
    raise OSError("no files to open")
OSError: no files to open

Testing a claimed successful one to see what was written, a search for container CMIP6.DCPP.IPSL.IPSL-CM6A-LR finds one owned by Ag's username. Looking inside this, filtering the objects by dcppC-ipv-pos.r1i1p1f1.Amon.huss.gr.v20190110.zarr finds no results. Testing the filter by instead using dcppC-amv-ExTrop-neg.r10i1p1f1.Amon.ps.gr.v20190110.zarr (a known existing object seen on the file list) finds this object, so the filter itself is working. It is unclear where the objects that claim to have been written are going.

agstephens commented 3 years ago

This appeared to work:

http://cmip6-zarr-o.s3.jc.rl.ac.uk/CMIP6.DCPP.IPSL.IPSL-CM6A-LR/dcppC-ipv-pos.r1i1p1f1.Amon.huss.gr.v20190110.zarr/.zattrs

agstephens commented 3 years ago

Generic download URL for objects:

http://cmip6-zarr-o.s3.jc.rl.ac.uk/CMIP6.DCPP.IPSL.IPSL-CM6A-LR/dcppC-amv-ExTrop-neg.r10i1p1f1.Amon.clt.gr.v20190110.zarr/clt/0.0.0

alaniwi commented 3 years ago

Before launching batches on Lotus, I will need to check: