NOAA-PMEL / Ferret

The Ferret program from NOAA/PMEL
https://ferret.pmel.noaa.gov/Ferret/
The Unlicense
55 stars 20 forks source link

DSG files: Implement attributes from NCEI templates #1751

Open karlmsmith opened 6 years ago

karlmsmith commented 6 years ago

Reported by @AnsleyManke on 4 Nov 2016 22:04 UTC Kevin had this email on the NCEI templates for DSG files. I'll write some comments, and then add to this ticket with specific things to implement.


Date: Wed, Oct 12, 2016 at 7:46 AM Subject: NCEI Example netCDF File Testing Reports

Data Providers,

Have you ever been interested in seeing what a netCDF file would look like if it followed the NCEI/NODC netCDF templates 100%?

NCEI has developed a set of 'gold standard' example netCDF files which precisely follow the best practices as recommended by NCEI (formerly NODC). You can find the example files at the following locations: http://data.nodc.noaa.gov/ncei/example/data/netcdf/ and in the THREDDS catalog: http://data.nodc.noaa.gov/thredds/catalog/example/catalog.html Within those directories/catalogs you will see a v1.1 and v2.0 which correspond to the NCEI/NODC version 1.1 and 2.0 templates. The file naming convention describes which featureType and template version number the file is following.


Looking at the template netcdf files:

They use the feature-axis as the ID variable, so the ID is numeric and will not work in LAS. Is this something to work towards allowing? On the other hand, in LAS, meaningful strings for constraining by platform is really useful. At least we need a good method to easily rework such files to add a new string ID variable and move the cf_role attribute from the coordinate variable to the string ID variable.

Some are attributes that are also in CF that we've never implemented in Ferret:

The valid-range attributes would have to come from the data provider.

Many more are from the Attribute Conventions for Data Discovery http://wiki.esipfed.org/index.php/Attribute_Convention_for_Data_Discovery

This set of conventions includes global attributes for geospatial and time coverage. We can write these.

There are others that describe the file history, creator, institution, project etc., which we could write with nominal values that could be filled in if available.

The remaining ones are data_min and data_max attributes. I don' see these in the CF or Data Discovery standard, but we could add them to any file we write. (What would ERDDAP do with these? I think it picks up attributes from the first file when it aggregates a set of files in time? If so, data_min and data_max would not be correct for the whole time series.)

The examples also show a way to add instrument, platform, and grid-mapping metadata to the netCDF file. Those are done as single-valued variables with no grid, so really just a set of attributes. This information would be defined by the data providers. Platform metadata also might contain things that one would want to promote to "metadata variables" that could be searched on in ERDDAP and LAS. We should develop a way to add variables like this, with no dimension, and also see what ERDDAP makes of them.

For example, Platform and Instrument metadata is pointed to from the temp variable:

        ...
        double temp(trajectory, obs) ;
                ...
                temp:long_name = "Temperature" ;
                temp:platform = "platform1" ;
                temp:instrument = "instrument1" ;
                  ...
        char instrument1 ;
                instrument1:long_name = "Seabird SBE 45 MicroTSG Thermosalinograph" ;
                instrument1:ncei_name = "Thermosalinographs" ;
                instrument1:make_model = "SBE-45" ;
                instrument1:serial_number = "1859723" ;
                instrument1:calibration_date = "2016-03-25" ;
                instrument1:accuracy = "" ;
                instrument1:precision = "" ;
        char platform1 ;
                platform1:long_name = "Alexander Von Humboldt" ;
                platform1:ncei_code = "ALEXANDER VON HUMBOLDT" ;
                platform1:ioos_code = "urn:ioos:station:NCEI:AlexanderVonHumboldt" ;
                platform1:call_sign = "DFAW" ;
                platform1:ices_code = "" ;
                platform1:imo_code = "8626886" ;
                platform1:wmo_code = "" ;

There is also a grid-mapping piece of metadata, which we could add if appropriate

        double crs ;
                crs:grid_mapping_name = "latitude_longitude" ;
                crs:longitude_of_prime_meridian = 0. ;
                crs:semi_major_axis = 6378137. ;
                crs:inverse_flattening = 298.257223563 ;
                crs:epsg_code = "EPSG:4326" ;
                :geospatial_bounds_crs = "EPSG:4326" ;
                :geospatial_bounds_vertical_crs = "EPSG:5829" ;

Migrated-From: http://dunkel.pmel.noaa.gov/trac/ferret/ticket/2479

karlmsmith commented 6 years ago

Comment by @AnsleyManke on 4 Nov 2016 22:49 UTC Items:

1) Add CF attributes standard_names coordinates attributes on data variables cell_methods (?) valid_min, valid_max, or valid_range

Standard names comes from the CF standard names table. http://cfconventions.org/

Coordinates attributes for data variables: The definition in the CF standard is "The value of the coordinates attribute is a blank separated list of the names of auxiliary coordinate variables." In the template example files, the coordinates attributes has the value "time lat lon z" for all data variables in all of the data types. The template files all have variables in all four of these directions, either on the obs axis or the feature-axis. Many files we deal with do not have all of these - for instance there's no depth data for some of the timeseries or trajectory files, so we should list only those location/time variables that the file does contain.

The cell_methods attribute has to come from the data provider. It describes how the data are put into cells. Was the data averaged, or some other statistic used; is it a single reading?

Likewise the valid min, max, range are also determined by the data provider.

2) Add the global attributes for geospatial and time coverage as appropriate. These are defined in the data discovery conventions

            :geospatial_lat_min = 38.048 ;
            :geospatial_lat_max = 38.048 ;
            :geospatial_lat_units = "degrees_north" ;
            :geospatial_lon_min = -123.458 ;
            :geospatial_lon_max = -123.458 ;
            :geospatial_lon_units = "degrees_east" ;
            :geospatial_vertical_min = 1.5 ;
            :geospatial_vertical_max = 1.5 ;
            :geospatial_vertical_units = "m" ;
            :geospatial_vertical_positive = "down" ;

            :time_coverage_start = "2015-03-25T22:23:38Z" ;
            :time_coverage_end = "2015-03-25T22:25:08Z" ;
            :time_coverage_resolution = "PT10.S" ;
            :time_coverage_duration = "PT1M30S" ;

3) Look at the rest of the Highly Recommended and Recommended attributes in the data discovery conventions and include all that make sense.

4) Discuss whether to add data_min and data_max for individual files. This seems problematic for sets of files that make up one Trajectory, Timeseries, etc. and which will be aggregated using ERDDAP.

5) Come up with ways to add metadata variables, either with no dimension or perhaps on a single-point "metadata" axis.