Open AJueling opened 4 years ago
@AJueling, do you mind if I transfer this issue to this https://github.com/NCAR/intake-esm-datastore repo instead? I am planning on commenting once it's there
Thanks for the quick reply! I don't mind if you move it, of course. (I was not sure where to ask this in the first place.)
@AJueling,
Are you working with time-slices (history files i.e. do you have one time step in each file with a bunch of data variables) or time-series (multiple time steps with one data variable)?
As @matt-long pointed out in https://github.com/NCAR/intake-esm/issues/112
There is a widespread assumption in intake-esm that there is one variable per file. This precludes using the package with multi-variable files, such as those written directly by CESM.
Unfortunately, this issue of multi-variable
files is still unresolved :(
How do I concatenate along the time axis?
If you were working with time-series (single data variable per file), the following would address the issue:
Add a time_range
column in the csv that specifies the date ranges in each file.
Add an aggregation_control
section to your collection.json
:
{
"esmcat_version": "0.1.0",
"id": "CESM_simulations",
"description": "This is an ESM collection for CESM1 simulations.",
"catalog_file": "simulations.csv",
"attributes": [
{
"column_name": "component",
"vocabulary": ""
},
{
"column_name": "frequency",
"vocabulary": ""
},
{
"column_name": "experiment",
"vocabulary": ""
},
{
"column_name": "variable",
"vocabulary": ""
},
{
" column_name": "time_range",
"vocabulary": ""
}
],
"assets": {
"column_name": "path",
"format": "netcdf"
},
"aggregation_control": {
"variable_column_name": "variable",
"groupby_attrs": [
"component",
"experiment",
"stream"
],
"aggregations": [
{
"type": "union",
"attribute_name": "variable"
},
{
"type": "join_existing",
"attribute_name": "time_range",
"options": {
"dim": "time",
"coords": "minimal",
"compat": "override"
}
}
]
}
}
For reference, take a look at the collection for CESM2 runs (timeseries): https://github.com/NCAR/intake-esm-datastore/blob/master/catalogs/campaign-cesm2-cmip6-timeseries.json.
@andersy005 thank you for the reply. I am indeed working with time slice files that contain many variables which is the standard output format of CESM as far as I know. It's good to know that it does not work for my use case and I will use a different approach. I suppose we can close this for now and I will follow @matt-long's issue for any updates.
It's likely that this issue is of interest to other users. So, Let's leave it open (as a reference) until the multi variable files are supported.
@AJueling, just wanted to let you know that we've been working on functionality for building and using catalogs for CESM runs. Recently, @mgrover1 put together a great blog post with details on how to build a catalog from CESM history files: https://ncar.github.io/esds/posts/ecgtools-history-files-example/
We have many different CESM simulations and I would like to create an esm-intake collection of them. The output files are monthly mean netcdf files and contain many variables. I have created a
collection.json
file:and with a
simulations.csv
:I can create a catalogue
cat = intake.open_esm_datastore('collection.json').search(experiment=['CTRL'])
which results inbut when I create a dataset with
dset_dict = cat.to_dataset_dict(cdf_kwargs={'decode_times': False})
it returns a dataset with only a single time coordinate:resulting xarray dataset
calling `dset_dict['ocn.monthly.CTRL']` yields ```How do I concatenate along the time axis?