Open ChrisJohnNOAA opened 3 months ago
Thanks for considering @ChrisJohnNOAA
As noted elsewhere in the discussion, NCEI standard has this format as well: https://www.ncei.noaa.gov/data/oceans/ncei/formats/netcdf/v2.0/index.html, eg https://www.ncei.noaa.gov/thredds-ocean/catalog/example/v2.0/catalog.html?dataset=example/v2.0/NCEI_trajectoryProfile_template_v2.0_2016-09-22_181838.014029.nc has a structure like:
Dimensions: (trajectory: 1, obs: 10, z: 4)
Coordinates:
* trajectory (trajectory) int32 -2147483647
time (trajectory, obs) object ...
lat (trajectory, obs) float64 ...
lon (trajectory, obs) float64 ...
* z (z) float64 1.0 2.0 3.0 4.0
Dimensions without coordinates: obs
Data variables:
sal (trajectory, obs, z) float64 ...
temp (trajectory, obs, z) float64 ...
Also H.6.2 at https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/aphs06.html
This is a super useful way to organize data sets so hopefully it's not too hard to implement.
Thanks for considering this!
Copying over from the discussion here:
I believe Callum is correct, this is a present limitation in ERDDAP.
Here is an except Bob in 2017: _If a variable in the source file is e.g., lat and lon values that use different dimensions than the main data variables and that convert the projection x,y locations into lat and lon, then in ERDDAP they need to be in a separate dataset.
See https://coastwatch.pfeg.noaa.gov/erddap/download/setupDatasetsXml.html#dataStructures and the subsequent few paragraphs.
I know this sounds goofy and severely limiting. It is the one major situation where using ERDDAP isn't the best choice. But it only affects a small percentage of the total data files in NOAA (that is small consolation for you where it affects perhaps 100% of your data files for polarwatch). There is a solution -- a modification to ERDDAP that would support this but doing it would be a massive effort on my part (a couple of months with no distractions) so I haven't had time to do it.
There was a reason for doing it this way: it is this slightly-simpler-than-netcdf data model that allows ERDDAP to read data from many file types and write data to many file types. So there is great benefit, but it comes at a cost. Few people/groups/datasets pay the cost, but you are. Sorry._
The solution we had to implement at PolarWatch meant there with two datasets which was a bit of a hack and difficult for users.
It would be great to see this feature added to ERDDAP! I agree there is value in having this type of synthesized glider data accessible via griddap over tabledap. The list of benefits is quite long.
Best, Jenn
@callumrollo
@jklymak pointed out this post to me. Please try and leverage use of EDDTableFromMultidimNcFiles
. It is a bit messy, but I did manage to get the active acoustic echograms into ERDDAP combined with the typical glider environmental data (temperature, salinty, ...). I am currently also trying to walk over the NGDAC netCDF-2.0 solution to OG-1.0 using the same dataset. See: https://acoustics.fish.washington.edu/erddap/files/unit_507_20240512T0000/
It is still a work in progress, but a companion dataset will appear that will be the OG cross walked version. Grab me via email or join the conversation on UG2 Slack #data.
The pattern I am attempting to utilize should work for trajectory and profile, files.
I can make example datasets and XML configuration files available as well. Just let me know.
@jcermauwedu Its hard to see what you mean here from the linked ERDAPP files - they are just usual trajectory
files, are they not? What files does EDDTableFromMultidimNcFiles
produce?
@jcermauwedu @jklymak The ERDDAP access is at https://acoustics.fish.washington.edu/erddap/tabledap/index.html?page=1&itemsPerPage=1000. I have slowly been working out the same approach, it is how ERDDAP handles some of the discrete geometry datasets. A point of note in the installation instructions:
"When you look at the dataset's metadata in ERDDAP™, the DSG dataset appears to be in ERDDAP's internal format (a giant, database-like table). It isn't in one of the DSG formats (e.g., the dimensions and metadata aren't right), but the information needed to treat the dataset as a DSG dataset is in the metadata (for example, cdm_data_type=TimeSeries and cdm_timeseries_variables=aCsvListOfStationRelatedVarables in the global metadata and cf_role=timeseries_id for some variable). If a user requests a subset of the dataset in a .ncCF (an .nc file in DSG's Contiguous Ragged Array file format) or .ncCFMA file (a .nc file in DSG's Multidimensional Array file format), that file will be a valid CF DSG file. WARNING: However, if the dataset was set up incorrectly (so that the promises made by the metadata aren't true), then the response file will be technically valid but will be incorrect in some way."
There are several other gotchas in using this, most importantly your definition of the CDM data type may not be ERDDAPs, read the docs starting at https://erddap.github.io/setupDatasetsXml.html#cdm_data_type. Each type has certain other required metadata that tell ERDDAP which data plays a given role.
Thanks for your comments. I will read more into the cdm_data_types. All this is in attempt to get active acoustic data into the NGDAC and then become subsequently available via the ERDDAP service.
Looking at a reference file (Rutgers) deployment: https://gliders.ioos.us/erddap/info/ru32-20200111T1444-delayed/index.html. The cdm_data_type is TrajectoryProfile. So, we have stuck to this type for now. The NGDAC expects a series of profiles. Once the deployment is finished, I believe it can also take a series of profiles in a single trajectory. Some of the delayed
deployments still upload a series of individual profiles.
I am still in the middle of creating a fully IOOS Compliance Checker version of the NGDAC netCDF-2.0 specification and a OG-1.0 format version of the same dataset. These can be referenced now at: https://acoustics.fish.washington.edu/erddap/tabledap/index.html?page=1&itemsPerPage=1000
The graph
for those datasets now defaults to the same echogram
. I only have a single profile walked over from the v2 to v2_OG (OG-1.0). The netCDF file is mostly compliant except for some time specifications that I do not necessarily agree with and opened an issue at the OG github.
Once I get these settled and fully walked over, I need to send samples to Leila@NGDAC (leila.baghdad-brahim@tetratech.com) for review.
In a nutshell, the format specifies the time and depth coordinate dimensions. The echogram has 20 bins per sample/ping for each time coordinate. So, our first attempt was to use time(time, bin) and depth(time, bin). But this creates a lot of wasted space, even with netCDF's handling of missing values, sparse data.
Our next attempt is just to create an independent set of time and depth coordinate dimensions. Add a prefix echogram_ to the dimensions. This creates a completely independent set of axis and elegantly separates the typical environmental data: temperature and salinity from the active acoustic data. It also allows efficient storage of both sets of information and also allows us to maintain a single set of profiles or a single trajectory file.
v2:
double echogram_sv(echogram_time, echogram_bin) ;
echogram_sv:_FillValue = NaN ;
echogram_sv:units = "1" ;
echogram_sv:long_name = "Volume backscattering strength" ;
echogram_sv:colorBarMinimum = -80. ;
echogram_sv:colorBarMaximum = -30. ;
echogram_sv:colorBarPalette = "EK80" ;
echogram_sv:comment = "dimensionless units (dB re 1 m-1)" ;
echogram_sv:ioos_category = "Other" ;
echogram_sv:standard_name = "acoustic_volume_backscattering_strength_in_sea_water" ;
echogram_sv:platform = "platform" ;
echogram_sv:observation_type = "measured" ;
echogram_sv:coordinates = "echogram_time echogram_depth echogram_lon echogram_lat" ;
Unfortunately OG-1.0 also requires us to define separate coordinates beyond N_MEASUREMENTS
.
v2_OG:
double ECHOGRAM_SV(ECHOGRAM_N_MEASUREMENTS, ECHOGRAM_N_BINS) ;
ECHOGRAM_SV:_FillValue = NaN ;
ECHOGRAM_SV:units = "1" ;
ECHOGRAM_SV:long_name = "Volume backscattering strength" ;
ECHOGRAM_SV:colorBarMinimum = -80. ;
ECHOGRAM_SV:colorBarMaximum = -30. ;
ECHOGRAM_SV:colorBarPalette = "EK80" ;
ECHOGRAM_SV:comment = "dimensionless units (dB re 1 m-1)" ;
ECHOGRAM_SV:ioos_category = "Other" ;
ECHOGRAM_SV:standard_name = "acoustic_volume_backscattering_strength_in_sea_water" ;
ECHOGRAM_SV:platform = "platform" ;
ECHOGRAM_SV:observation_type = "measured" ;
ECHOGRAM_SV:coordinates = "lat_uv lon_uv time_uv" ;
Still hammering on this but I will be happy to share example datasets and XML configuration files for ERDDAP that enables these to work. It seems like I need to take a deep dive into the example data in ERDDAP in reference to the cdm_data_type
.
What is important here is there are 2d glider datasets accumulating. This includes ADCP data also now being collected on glider platforms also 2d in nature. Not the fixed mooring platforms of GCOOS (https://erddap.gcoos.org/erddap/info/wmo_42385/index.html) for which I was asked to look at these for reference to help us form a data model for the active acoustic data.
This needs some investigation on if and how to implement the feature. Original message below:
Pinging @ChrisJohnNOAA is this capacity that @jklymak describes something ERDDAP can support/could support in future?
As I understand, the desired behavior is to have a single griddap dataset which serves 2-D gridded variables like temperature(profile_num, depth_bin) as well as 1-D variables like lat(profile_num), without broadcasting these 1-D variables to 2-D.
At the moment, from reading the docs @rmendels highlighted (
In EDDGrid datasets, all data variables MUST use (share) all of the axis variables.
), it seems this is not currently supported. So you would have to create two different datasets on your ERDDAP server to achieve this.Originally posted by @callumrollo in https://github.com/ERDDAP/erddap/discussions/177#discussioncomment-10180612