Standard column names for catalogs

sethmcg commented 3 years ago

We had a discussion in the monthly S@S tag-up meeting and decided on some standardization of column names in the intake-esm catalogs.

CESM	NA-CORDEX	STANDARD
component	N/A	component
frequency	frequency	frequency
experiment	scenario	*
variable	variable	variable
variable_long_name	longname	long_name
dim_per_tstep	[add]	vertical_levels
start	[add]	start_time
end	[add]	end_time
[add gcm = 'CESM'?]*	driver	*
N/A	rcm	rcm
N/A	grid	grid
[add = 'raw'?]	biascorrection	bias_correction
[add]	units	units
[add]	[add]	standard_name
path	path	path

* NA-CORDEX uses 'driver' and 'scenario' instead of 'gcm' and 'experiment' because there are some simulations whose boundary conditions come from ERA-Interim, which is (technically speaking) a reanalysis, not a GCM. So 'scenario' is a superset of 'experiment', which includes 'era-int' in addition to 'historical', 'rcp85', etc., For LENS, it would probably make more sense to call it 'gcm'.

"vertical_levels" is an integer indicating the number of vertical levels; for a 2-D variables, it's '1'.

The NA-CORDEX 'grid' variable covers both spatial resolution and spatial domain. We'll probably also want to add some information about the spatial domain to the catalog metadata, but that may be a top-level element rather than a column. We'll probably want both a lat-lon bounding box and a human-readable "region" string; the spatial extent of the array is constant, but where there's non-missing data can vary. (E.g., data bias-corrected with Daymet covers North America (land-only), while data bias-corrected with gridMET covers only CONUS.) A region string might also apply to LENS, since atm data is global but ocean and ice data is not.

Note: column ordering doesn't generally matter, but 'path' should come at the end for legibility when the tables are printed out.

jeffdlb commented 3 years ago

Thanks for sending this out, Seth.

Some thoughts below. Not all of these need to be implemented immediately -- or at all if they are bad ideas -- but I just want to put them out to provoke discussion.

(1) long_name is specific to the modeling project. In future it would be good to also have a 'cf_standard_name' column (recognizing that several variables may actually fit under the rubric of a given CF Name).

(2) Should 'rcm' be spelled out as regional_climate_model for greater understandability and consistency with the others? [ignore if #3 adopted]

(3) Perhaps we should think a bit more about how we handle the source-related columns like experiment+gcm for LENS and scenario+rcm+bias_correction for NA-CORDEX.

There may be a proliferation of similar but non-identical column names as other model outputs are published (e.g., CESM2 LENS, FACETS, etc).
Furthermore, if we do some observational datasets (as promised in the project plan), then the modeling specifics are not relevant, but something else like the sensor name or observing platform is relevant.
The Intake-ESM catalogs should serve two purposes, in my view: (a) Allow users to distinguish between and select among Zarr stores that are available for a particular data collection. (b) Allow users to distinguish/select Zarr stores that represent comparable variables from multiple data collections (e.g., all surface temperature regardless of source).
Item #b can be handled by extracting a few rows from different Intake-ESM CSV files and writing them to a new CSV file. The union of all column names must be included. If there are too many source-specific columns then the resulting list will have many columns and be sparse.
Therefore, instead of the specific experiment, gcm, scenario, rcm, & bias_correction columns, maybe there should be generic column(s) such as 'source' and perhaps 'post_processing' that would concatenate relevant descriptors. For example, source = GCM;experiment (eg, 'CESM1;RCP8.5' for CESM1 LENS under RCP 8.5 assumption); source = GCM;RCM;scenario for NA-CORDEX; source = platform;sensor for observations; post_processing = raw or bias_corrected or level_3 or other appropriate descriptor(s).

(4) We might wish to include a 'metadata' column that has URL of MD record. Intake-ESM allows for basic discovery, but somewhere we should point to the full metadata.

(5) We might wish to define a 'comment' column that has free text as needed.

Regards, Jeff DLB

Jeff de La Beaujardiere, PhD Director, NCAR/CISL Information Systems Division https://staff.ucar.edu/users/jeffdlb https://orcid.org/0000-0002-1001-9210

On Tue, Jan 12, 2021 at 2:49 PM Seth McGinnis notifications@github.com wrote:

We had a discussion in the monthly S@S tag-up meeting and decided on some standardization of column names in the intake-esm catalogs. CESM NA-CORDEX STANDARD component N/A component frequency frequency frequency experiment scenario variable variable variable variable_long_name longname long_name dim_per_tstep [add] levels start [add] start_time end [add] end_time [add gcm = 'CESM'?] driver * N/A rcm rcm N/A grid grid [add = 'raw'?] biascorrection bias_correction [add] units units [add] [add] standard_name path path path

NA-CORDEX uses 'driver' and 'scenario' instead of 'gcm' and 'experiment' because there are some simulations whose boundary conditions come from ERA-Interim, which is (technically speaking) a reanalysis, not a GCM. So 'scenario' is a superset of 'experiment', which includes 'era-int' in addition to 'historical', 'rcp85', etc., For LENS, it would probably make more sense to call it 'gcm'.

"levels" is an integer indicating the number of vertical levels; for a 2-D variables, it's '1'.

The NA-CORDEX 'grid' variable covers both spatial resolution and spatial domain. We'll probably also want to add some information about the spatial domain to the catalog metadata, but that may be a top-level element rather than a column. We'll probably want both a lat-lon bounding box and a human-readable "region" string; the spatial extent of the array is constant, but where there's non-missing data can vary. (E.g., data bias-corrected with Daymet covers North America (land-only), while data bias-corrected with gridMET covers only CONUS.) A region string might also apply to LENS, since atm data is global but ocean and ice data is not.

Note: column ordering doesn't generally matter, but 'path' should come at the end for legibility when the tables are printed out.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NCAR/science-at-scale/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF4W4RSGIN7K6DLOBTZTD3SZSRWNANCNFSM4V7WZURA .

aaronspring commented 3 years ago

For predictions (numerical weather prediction, seasonal (SubX,S2S) predictions or decadal predictions (DCPP)) there is an additional dimension needed, I would name that init (as to be defined in json and csv column).

for that purpose, Anderson split dcpp_init_year from member_id for experiment_id==dcppA-hindcast in CMIP6 which contains DCPP. this allows easy integration of climpred, see https://climpred.readthedocs.io/en/stable/examples/preprocessing/setup_your_own_data.html#intake-esm-for-cmorized-output. Catalog building in https://github.com/NCAR/intake-esm-datastore/blob/master/builders/notebooks/glade-cmip6_catalog_builder.ipynb: In DCPP members are named s{inityear}-{ordinary-member_id_like_r2i1p1f1}

df["dcpp_init_year"] = df.member_id.map(lambda x: float(x.split("-")[0][1:] if x.startswith("s") else np.nan))
df["member_id"] = df["member_id"].map(lambda x: x.split("-")[-1] if x.startswith("s") else x)

However, dcpp_init_year is very specific to DCPP. init meaning timestamp of initialization would fit better. although in the member naming there is also an i, e.g. r2i1p1f1 but I would rather see this i for an particular type of initialization.

PS: Hi all, @jeffdlb referred me here, I am helping @judithberner with getting SubX into the cloud https://github.com/pangeo-data/pangeo-datastore/issues/121 and we want to use intake-esm with it

sethmcg commented 3 years ago

@jeffdlb - I really like the idea of having a "source" column that sums up the dataset-specific elements. I think there's too much variation between data sources to have much hope of being able to come up with a single unifying schema that will cover all the different experiments, but with the concatenated-source approach we can use whatever elements are appropriate for the dataset and still have one column that's appropriate for comparing across datasets.

That approach also allows augmenting the source column further if needed. If at some point we determine that we need to indicate that the dataset is part of Amazon's Open Data, we can just tack on "open-aws" on the end. (In fact, I even wonder if maybe the ideal way to do it would be to just list the columns to be aggregated to generate the source column and construct it on the fly when it's needed.)

Agreed that we should probably discuss it further. How should it be concatenated? My inclination would be to use whitespace to separate elements, but are there any specific tools / formats we need to consider compatibility with?

bonnland commented 3 years ago

Agreed that we should probably discuss it further. How should it be concatenated? My inclination would be to use whitespace to separate elements, but are there any specific tools / formats we need to consider compatibility with?

Looking at the common python tools for reading/writing CSV files, the default delimiter for fields is the comma. It seems OK for now to assume that we will use this delimiter for fields, so the only other question is what sub-delimiter we want. If we can convince ourselves that a space is outside the alphabet for sub-fields (I can see why it could be), then it seems like a possible choice. Though perhaps a different choice will make the sub-fields more easily readable by people?

sethmcg commented 3 years ago

Brian and I discussed how to structure the zarr catalog for NA-CORDEX and decided it would be best to catalog data granules at the logical level (one record per distinct data element), rather than the physical level (one record per zarr store). So it will have one row for each ensemble member in each zarr store. For concatenated stores, the scenario will be e.g. "hist+rcp85".

I have updated the netcdf catalog for NA-CORDEX. The columns are now:

column	description
variable	short CORDEX name (tmin, prec, rsds, etc.)
scenario	eval (ERA-Int), hist, rcp26, rcp45, or rcp85 (note rcp85, not rcp8.5)
driver	name of global climate model providing boundary conditions (or ERA-Int for reanalysis-driven runs)
rcm	name of regional climate model
frequency	"day" for everything except static variables, which = "fixed"
grid	"NAM-44i" = 0.5 degree resolution, "NAM=22i" = 0.25 degree; common lat-lon grid w/ cell boundaries at integer values
bias_correction	"raw" for uncorrected data, "mbcn-Daymet" or "mbcn-gridMET" for bias-corrected
long_name	human-readable description of data
units	value of units attribute for data; supposed to be standard by variable, but may not be
standard_name	CF standard name for data variable
vertical_levels	always = 1, since this is all 2-D data, no 3-D
member_id	"RCM.GCM"; needed for json aggregation
source	concatenation of bias_correction, rcm, driver, & scenario; for comparison with other catalogs (e.g., LENS)
path	absolute path to netcdf files on Glade

Note that standard_name alone is insufficient to distinguish some variables. The variables tas and temp are the same, but in units of K and degC, respectively. Units add information, but not enough; tas (daily average temperature), tasmin (daily minimum temperature) and tasmax (daily maximum temperature) all have the same units and standard_name. For NA-CORDEX, the long_name attribute has been standardized, and provides a human-readable version of the information encompassed by variable, units, and standard_name.

I have not yet added start_time or end_time; we need to discuss whether to use nominal bounds or actual bounds (which would require a lot more work).

bonnland commented 3 years ago

I have not yet added start_time or end_time; we need to discuss whether to use nominal bounds or actual bounds (which would require a lot more work).

By nominal bounds, do you mean the start and end time steps reflected in, for example, the NetCDF files for NA-CORDEX? Because this should work fine when it comes to creating Zarr stores. The concatenate/merge step for Zarr creation simply requires aligned calendar axes, and the resulting time axis will be the union of all aligned time steps.

jeffdlb commented 3 years ago

Thanks for continuing to work on this.

Brian and I discussed how to structure the zarr catalog for NA-CORDEX and decided it would be best to catalog data granules at the logical level (one record per distinct data element), rather than the physical level (one record per zarr store). So it will have one row for each ensemble member in each zarr store.

When I first read this I thought you meant fewer rows, each referencing multiple Zarr stores, but upon re-reading I wonder whether you mean multiple rows pointing to segments of the same Zarr store. Which is it? I have a concerns/questions about each approach...

Separate question:

bias_correction | "raw" for uncorrected data, "mbcn-Daymet" or "mbcn-gridMET" for bias-corrected

Does this concept apply to many/most model experiments or only to NA-CORDEX? It does not generally apply to observational data. Something like processing_level = {appropriate dataset-specific terminology) would perhaps be more generic. On the other hand,

driver | name of global climate model rcm | name of regional climate model source | concatenation of bias_correction, rcm, driver, & scenario; for comparison with other catalogs (e.g., LENS)

There is some redundancy. We should either use generic source & processing_level, or use a proliferation of specific columns like bias_correction & driver & rcm, but not both.

Path or paths plural?

I have not yet added start_time or end_time; we need to discuss whether to use nominal bounds or actual bounds (which would require a lot more work).

A separate discussion Tuesday about DASH Repository noted that some datasets have disjoint time intervals, which are difficult to accurately represent by a pair of start and end times, so a single time_range with one or more pairs of values would be better.

sethmcg commented 3 years ago

Hmm, some complicated issues that I think are going to be difficult to work out via comment thread. I think we need a real-time discussion. I'll try to schedule a meeting with me, Jeff, Anderson, and Brian. Let me know if there's anyone else I should try to rope in.

andersy005 commented 3 years ago

Let me know if there's anyone else I should try to rope in.

@mnlevy1981's input on this would be very useful since he (1) knows more about CESM data in their different forms, (2) has been trying to standardize columns for the CESM catalogs in https://github.com/NCAR/intake-esm-datastore/issues/64

mnlevy1981 commented 3 years ago

I can't make the meeting this afternoon, but will continue to follow along with this issue ticket and if there's a follow-up meeting I'll try to make it. I think the issue @andersy005 linked (https://github.com/NCAR/intake-esm-datastore/issues/64) has a complete list of the columns we expect to search by, though the initial comment is just the start and the list continues in the responses. We are happy to include other columns that are useful in other projects for the sake of consistency among catalogs.

sethmcg commented 3 years ago

Conclusions from the meeting: the primary purpose of these catalogs is to support dataset-oriented access and cross-dataset inventires. They're not the main first discovery step; users will likely have some idea about what's in the dataset by the time they arrive at the catalogs.

Although we want to avoid unlimited proliferation of columns, we need to allow for non-standard columns that capture important facets specific to the dataset. Standard columns should be validateable.

Updated intake-esm catalog structure for NA-CORDEX:

std?	column	valid values	description / note
y	variable	see CORDEX variable table	standard short CORDEX variable name
y	long_name	see CORDEX variable table	controlled vocab
y	units	see CORDEX variable table	some datasets may be non-compliant
y	standard_name	see CORDEX variable table	CF standard name; doesn't fully distinguish variables (e.g., tasmax)
y	spatial_domain	CF standardized region name	always = north_america
y	grid	NAM-44i, NAM-22i (for NA-CORDEX)	44i = 0.5 degree resolution, 22i = 0.25 degree; common lat-lon grid w/ cell boundaries at integer values
y	vertical_levels	integer > 0	always = 1, since this is all 2-D data, no 3-D
y	frequency	day, fixed, (mon, seas, ann, ymon, yseas, etc]	zarr stores only have daily & fixed (static) data
y	start_time	ISO-8601 datetime	start of time coordinates in data array (nominal; don't worry about ragged ends)
y	end_time	ISO-8601 datetime	end of time coordinates in data array (nominal; don't worry about ragged ends)
n	model	list of [RCM.GCM] strings	see [NA-CORDEX simulation matrix] (https://na-cordex.org/simulation-matrix.html) for full set of pairings
n	scenario	eval, hist, rcp26, rcp45, rcp85	GCM experiment or "eval" for ERA-Int runs; note rcp85, not rcp8.5
n	bias_correction	raw, mbcn-Daymet, mbcn-gridMET	mbcn = method, Daymet/gridMET = obs dataset
y	path	[valid path]	absolute path to netcdf files on Glade

Other notes:

Dropped rcm & driver because we need a one-to-one mapping between zarr stores / netcdf files and catalog rows.
member_id was renamed to model
We deferred inclusion of a spatial bounding box until later.
Additions to the CORDEX variable table: prec = pr but units of mm/day instead of flux units; temp, tmax, tmin, = tas, tasmax, tasmin in degC instead of K.

jeffdlb commented 3 years ago

Thanks, Seth. Comments:

In addition to start_time, we need end_time.
Grid is, IMHO, not standard, because: different ways of expressing for different modeling projects, and not relevant to many observational data types. Or rather: standard name, but optional.

-Jeff

sethmcg commented 3 years ago

I added end_time. Thanks!

Thoughts on grid:

Maybe 'grid' is not the right name, but I think we want a column that captures spatial resolution, and that it should be present in every catalog, so that's why I would argue for making it a standard column.

If it's standard, it does need a controlled vocabulary, which may be hard to define up front, but GCMs have the grid descriptor strings like "T42" and "Tco199" and "C180" (I don't know the proper names for these) and CORDEX defines NAM-44i and EUR-22 and so on with a spec document, so my sense is that we could do it without being completely ad hoc; I think there's conventional nomenclature out there in the community we could leverage.

For obs data, I expect there's some equivalent set of conventions for swath and trajectory data, and for stationary observations we could just give it a value of "point" in the same way we use "frequency = fixed" for static data.

jeffdlb commented 3 years ago

Perhaps grid should be optional, and used only when the elements in the catalog need to be distinguished from each other by the choice of grid?

If used, I would prefer spatial_resolution in units of km or deg, but understand that is not always applicable so a controlled vocab may be needed.

Jeff

On Mon, Feb 8, 2021 at 4:21 PM Seth McGinnis notifications@github.com wrote:

I added end_time. Thanks!

Thoughts on grid:

Maybe 'grid' is not the right name, but I think we want a column that captures spatial resolution, and that it should be present in every catalog, so that's why I would argue for making it a standard column.

If it's standard, it does need a controlled vocabulary, which may be hard to define up front, but GCMs have the grid descriptor strings like "T42" and "Tco199" and "C180" (I don't know the proper names for these) and CORDEX defines NAM-44i and EUR-22 and so on with a spec document, so my sense is that we could do it without being completely ad hoc; I think there's conventional nomenclature out there in the community we could leverage.

For obs data, I expect there's some equivalent set of conventions for swath and trajectory data, and for stationary observations we could just give it a value of "point" in the same way we use "frequency = fixed" for static data.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/science-at-scale/issues/4#issuecomment-775529638, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF4W4VH36WTN7JDR2SVJILS6BWWRANCNFSM4V7WZURA .

judithberner commented 3 years ago

Hi - we started a github conversation about the next steps for zarrifying S2S-data. The format is the SubX format, but we are have local CESM output that follows the SubX output protocol. One of the issues is whether to use esm_intake. See issue here: https://github.com/bradyrx/climpred_CESM1_S2S/issues/1 I think we should use esm_intake if possible. Tagging: CESM team: @abjaye IRI team: @ikhomyakov, @awrobertson, @aaron-kaplan Climpred team: @aaronspring

jeffdlb commented 3 years ago

Hi Judith-

Yes, please do use intake-esm, including the extra columns that we are trying to standardize. I'm not sure what you mean by "SubX format" -- is that a convention applied to SubX data that is in a particular format such as CF/NetCDF?

-Jeff DLB

J-F de La Beaujardiere, PhD (he/him, il/lui) Director, NCAR/CISL Information Systems Division https://staff.ucar.edu/users/jeffdlb https://orcid.org/0000-0002-1001-9210

On Wed, Feb 24, 2021 at 5:36 PM judithberner notifications@github.com wrote:

Hi - we started a github conversation about the next steps for zarrifying S2S-data. The format is the SubX format, but we are have local CESM output that follows the SubX output protocol. One of the issues is whether to use esm_intake. See issue here: bradyrx/climpred_CESM1_S2S#1 https://github.com/bradyrx/climpred_CESM1_S2S/issues/1 I think we should use esm_intake if possible. Tagging: CESM team: @abjaye https://github.com/abjaye IRI team: @ikhomyakov https://github.com/ikhomyakov Climpred team: @aaronspring https://github.com/aaronspring

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NCAR/science-at-scale/issues/4#issuecomment-785488865, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABF4W4SPYMQZRPXJXLXXNGTTAWLQ5ANCNFSM4V7WZURA .

judithberner commented 3 years ago

SubX is written as netcdf output. The comment pertained to the fact the the S2S simulations with CESM output the variables required by the SubX protocol on a grid required by SubX and the ensemble size is the same as that required for SubX. So any scripts we develop for the CESM S2S output should theoretically work just the same for the SubX data sets.

aaron-kaplan commented 3 years ago

Hi folks. At the invitation of @judithberner I just parachuted in to this rather long discussion, and I'm not exactly sure what I'm doing here.

I work on the IRI Data Library. We host the SubX data, and we've been talking with Judith about making it available in S3 as Zarr.

This github issue appears to be about an ESM catalog, but we were not planning on using ESM. We talked with Ryan Abernathey of Pangeo recently and he recommended that we use STAC for our catalog. We have no prior experience with either ESM or STAC, but I'm inclined to follow Ryan's advice until/unless we discover a reason not to. If you have reasons, please share them.

sethmcg commented 3 years ago

@jeffdlb The more I think about this issue, the trickier it gets.

I would like to use something straightforward like spatial_resolution, but I'm concerned that it could be seriously misleading. Three example problems that come to mind:

Some GCMs use a Gaussian grids, which have constant spacing in longitude but variable spacing in latitude. I'm not sure how to express the spatial resolution clearly without just referencing the grid.
CORDEX defines a lat-lon common grid with a constant spatial resolution in degrees. But on the native grid, many (most) of the CORDEX RCMs use a rotated polar coordinate system that's also a constant spatial resolution in degrees. And the 0.5 degree lat-lon grid corresponds to the 0.44-degree rotated polar grid because they both have a resolution of about 50 km at mid-latitudes.
Nested models and variable-resolution models like MPAS, which have higher resolution in some portions of the domain than others. (These cases may be less of a concern, because often data from the high-res portion of the domain is extracted and put onto a constant-resolution grid, and because a variable-resolution mesh is (currently) incompatible with xarray's data model. The question is, do we need to future-proof our spec against that kind of issue?)

So my thinking was that we can't really get away from saying something about the nature of the grid in addition to its resolution or users will get confused when they run into one of these kinds of issues. And if the values came from a controlled vocabulary (or at least a somewhat standardized source), you might not know what something like "grid = T68" means when you run across it, but at least you can look it up and find out that Gaussian grids exist. But I agree that it would also be really useful to have something that gave you a rough sense of the resolution, even if it's not exact.

What if we required nominal_spatial_resolution in km or deg instead of spatial_resolution and also required or at least strongly recommended a spatial_grid entry coming from a controlled (ish) vocabulary? That seems like it would provide users with that rough sense while also signposting the fact that it's a complicated issue they should be prepared to need to investigate further.

*> Perhaps grid should be optional, and used only when the elements in the catalog need to be distinguished from each other by the choice of grid? If used, I would prefer spatial_resolution in units of km or deg, but understand that is not always applicable so a controlled vocab may be needed. Jeff

sethmcg commented 3 years ago

Update after discussion on 2021-03-12: include nominal_spatial_resolution as a required column, add grid as an optional column.

std?	column	valid values	description / note
y	variable	see CORDEX variable table	standard short CORDEX variable name
y	long_name	see CORDEX variable table	controlled vocab
y	units	see CORDEX variable table	some datasets may be non-compliant
y	standard_name	see CORDEX variable table	CF standard name; doesn't fully distinguish variables (e.g., tasmax)
y	spatial_domain	CF standardized region name	always = north_america
y	spatial_resolution	numeric with units	e.g. "0.5 deg" or "25 km" (nominal; use typical value for irregular grids)
y	vertical_levels	integer > 0	always = 1, since this is all 2-D data, no 3-D
y	frequency	day, fixed, (mon, seas, ann, ymon, yseas, etc]	zarr stores only have daily & fixed (static) data
y	start_time	ISO-8601 datetime	start of time coordinates in data array (nominal; may have ragged ends)
y	end_time	ISO-8601 datetime	end of time coordinates in data array (nominal; may have ragged ends)
n	grid	NAM-44i, NAM-22i (for NA-CORDEX)	as defined by the CORDEX Archive Specification
n	model	list of [RCM.GCM] strings	see [NA-CORDEX simulation matrix] (https://na-cordex.org/simulation-matrix.html) for full set of pairings
n	scenario	eval, hist, rcp26, rcp45, rcp85	GCM experiment or "eval" for ERA-Int runs; note rcp85, not rcp8.5
n	bias_correction	raw, mbcn-Daymet, mbcn-gridMET	mbcn = method, Daymet/gridMET = obs dataset
y	path	[valid path]	absolute path to netcdf files on Glade or zarr stores on object store

jeffdlb commented 3 years ago

Can we call it spatial_resolution, and just include the "nominal" aspect in the documentation, as we have done for start/end_time? I'd also suggest modifying the description of those to be "Nominal start [end] of time coordinates in data array (may have ragged ends when multiple simulations are combined)."

sethmcg commented 3 years ago

@jeffdlb Sounds good! Table above has been updated.

NCAR / science-at-scale

Standard column names for catalogs #4