OceanGlidersCommunity / OG-format-user-manual

OceanGliders format and vocabularies
15 stars 13 forks source link

Role of PARAMETER variable, is it really needed? #109

Closed JuangaSocib closed 3 months ago

JuangaSocib commented 2 years ago

moderator: @OceanGlidersCommunity/format-maintainers

Is your feature request related to a problem? Please describe. I'm not convinced on the role of the PARAMETER variable within the format. The format could be simplified by removing this variable as this could be represented as a variable attribute within each geophysical variable whose value is simply the full URL of the parameter.

Is this related to a specific platform models A generic glider or describe the platform models that would be relevant for this.

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

kerfoot commented 1 year ago

I filed this as an issue before realizing it had already been brought up here. Tried to delete my issue, but couldn't figure out how.

I have the same question as @JuangaSocib . I don't believe inclusion of this parameter and the N_PARAM variable are necessary. Unless there is evidence to support including this parameter and dimension, we should remove them to simplify the spec.

vturpin commented 1 year ago

Isn't it a good thing to have the list of the geophisical data stored in a variable ? How can you figure out automatically which parameter are available in the data set without the PARAMETER variable ?

Just some thoughts to decide if we get rid of PARAMETER or not ?

kerfoot commented 1 year ago
  1. Not sure what is meant by "figure out automatically which parameters are available"?
  2. Why would the user need to figure this out "automatically"?

As with any typical NetCDF data set, I would think that the user would open and examine the data set to see if it contains parameters of interest.

If it is deemed necessary to figure this out programmatically, there are multiple ways to do this. Here is one way, depending on which CF convention version the file conforms to:

  1. Using CF Conventions >1.6, which deprecates the status_flag modifier and recommends a standard_name of status_flag:

    > import xarray as xr
    > skip_standard_names = ['time', 'latitude', 'longitude', 'depth', 'status_flag']
    > ds = xr.open_dataset('/Users/kerfoot/Downloads/gliders/OceanGliders_DMTT/sp041/sp041_20191205T1757.nc')
    > parameters = [v for v in ds if 'standard_name' in ds[v].attrs.keys() and ds[v].attrs['standard_name'] not in skip_standard_names]
    > print(parameters)
    ['PRES', 'TEMP', 'PSAL', 'CHLA', 'DOXY']
  2. If using CF Conventions <1.7, which recommends use of the status_flag modifier:

    > import xarray as xr
    > skip_standard_names = ['time', 'latitude', 'longitude', 'depth']
    > ds = xr.open_dataset('/Users/kerfoot/Downloads/gliders/OceanGliders_DMTT/sp041/sp041_20191205T1757.nc')
    > parameters = [v for v in ds if 'standard_name' in ds[v].attrs.keys() and ds[v].attrs['standard_name'] not in skip_standard_names]

    parameters can be further filtered to remove QC variables:

    > [v for v in parameters if not ds[v].attrs['standard_name'].endswith('status_flag')]
    > print(parameters)
    ['PRES', 'TEMP', 'PSAL', 'CHLA', 'DOXY']

However, the CDL example doesn't use either status_flag convention. That is something that will need to fixed also, but belongs under another issue.

Lastly, while the specification was not explicitly developed for ERDDAP, there is a consequence of having multiple single dimensions in a single file and serving the data via ERDDAP. As far as I am aware, ERDDAP is not able to serve all variables in one or more NetCDF files that do not share the same dimension(s). See the EDDTableFromNcFiles documentation. So the administrator has to choose between serving variables with the N_MEASUREMENTS or N_PARAM dimension.

For reference, I took the CDL example, fixed the syntax errors (issue #111), created a NetCDF-4 file and loaded the data set into ERDDAP:

http://slocum-test.marine.rutgers.edu/erddap/tabledap/sp041_20191205T1757_n_measurements.html

You can see that all of the variables using the N_MEASUREMENTS dimension are served, but the PARAMETER variable is not. I did send an inquiry to the ERDDAP Google Group to confirm this and am waiting a response.

jenseva commented 1 year ago

Hi John,

I think Bob got back to you about this via the Google group, but yes, in my experience ERDDAP will only serve variables with common dimensions within a single ERDDAP dataset.

There are a few work-arounds, one is to create two ERDDAP datasets, one each for the differently dimensioned data. I've had to resort to this method (for projection variables), it's not ideal and I don't see that being a good option for this scenario.

In my case the NetCDF files were CF compliant and worked as expected in Panoply and THREDDS, this was only an issue in ERDDAP and was related to limitations of how ERDDAP was initially designed.

HTH, Jenn

jenseva commented 1 year ago

I've created a sample of removing the PARAM dimension. https://github.com/jenseva/og-netcdf/blob/main/og-netcdf-1.nc

https://github.com/jenseva/og-netcdf/blob/main/og-netcdf-1.cdl

If the information stored in a PARAM variable is needed and you envision the files will be aggregated in such a way that preserving instrument metadata cannot be captured in the geophysical variable attributes - then you can make variables. This example demonstrates this. See guidance spec details and more in my test repo at https://github.com/jenseva/og-netcdf

I am new to understanding the use case for the OG-1.0 format but my sense is that these files will not be aggregated so the file could be even simpler than my sample file - with just attributes within the geophysical variable and no need for the ancillary instrument variable. I can create an example of that simpler version if interested.

justinbuck commented 1 year ago

This is part of CMEMS interoperability, need to be added to documentation as to why it is present. Needs input of @tcarval . Downgrading OG 1.X o sis part of future discussions