OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Document and verify compliance with SONAR-netCDF4 convention #210

Closed emiliom closed 2 years ago

emiliom commented 3 years ago

Let's go through the current implementation of SONAR-netCDF4 convention. We can maintain a markdown table to manually document what is currently covered, what sections of the convention it corresponds to, confirmation that it's properly implemented, etc

SONAR-netCDF4 convention:

leewujung commented 3 years ago

Adding some specifics to this:

emiliom commented 3 years ago

Notes from call between @emiliom and @leewujung on 2021-1-20:

We've laid out the sequence of tasks as follows:

  1. 1:1 mappings of variables, attributes and groups. Includes things only found in one of them (?).
  2. Content of variables, attributes and groups
  3. Things that are intentionally different, or diverged unintentionally
emiliom commented 3 years ago

Define a reasonable end point / goals where we want to be by the time we submit the echopype paper.

Also refer to the "data format" section in the docs: https://doc-test-echopype.readthedocs.io/en/latest/data-format.html

leewujung commented 3 years ago

For AZFP:

In the above cases the derived data are computed from raw sensor readings (in Vendor/Ancillary) and a few sensor parameters.

Questions to consider:

leewujung commented 3 years ago

For EK60:

In both cases these values need to be further converted to physically meaningful quantities.

Things to consider:

emiliom commented 3 years ago

@leewujung can you point to netcdf files generated by echopype that you'd consider "exemplary" with respect to the extent and quality of metadata included? Ideally one for EK60 and one for AZFP, but having just the EK60 one would be fine for now. Don't send me two for a given instrument type! In this case, two is not better than one. If the file is on the repo, just include the link here; if on Google Drive, the directory hierarchy and file name. Thanks.

leewujung commented 3 years ago

Here you go, in Google Drive:

emiliom commented 3 years ago

Perfect, thanks!

emiliom commented 3 years ago

For echopype/convention_check/DY1801_EK60-D20180211-T164025.nc:

emiliom commented 3 years ago

The Beam group in DY1801_EK60-D20180211-T164025.nc has 30 data variables, but in 17082117.nc it only has 18 and several of those don't seem to be from the convention. On the other hand, 17082117.nc adds many global attributes (17) beyond the 2 defined in the convention and found in DY1801_EK60-D20180211-T164025.nc; except, the mandatory beam_mode attribute is empty.

Is 17082117.nc from OOI? As with the other nc file, Platform global attribute entries are absent (more specifically, the attributes are present but empty, whereas they're absent altogether in the EK60 file).

leewujung commented 3 years ago

For echopype/convention_check/DY1801_EK60-D20180211-T164025.nc:

  • Sonar/sonar_model attribute entry is "ER60". I assume that's a mistake, and it should be "EK60"?

That is unexpected! ER60 is the software that runs EK60, see here. From the spec sheet @ngkavin made the ER60 is in Sonar/sonar_software_version. Though that's also only partially correct, because it should be with the version number also.

TODO: Let's mark this version number to be something to check.

  • There is no indication I can find about the origin of the data, other than the raw file path in the Provenance group. Specifically, is this from OOI, a ship (Hake survey?), or something else? Platform group global attributes are not present

Based on convention v1.0 those should be in the Platform group global attributes but they are also optional. These attributes are what users can set when they convert the data. That particular method is working but is also one of what is left to be double checked for completing the convert overhaul. It is surprising that the attributes are absent altogether here compared to AZFP.

TODO: add global attributes to Platform group in EK60 TODO: the steps to set the platform group should be included in the docs.

leewujung commented 3 years ago

The Beam group in DY1801_EK60-D20180211-T164025.nc has 30 data variables, but in 17082117.nc it only has 18 and several of those don't seem to be from the convention. On the other hand, 17082117.nc adds many global attributes (17) beyond the 2 defined in the convention and found in DY1801_EK60-D20180211-T164025.nc; except, the mandatory beam_mode attribute is empty.

Yes. I made some decisions on which variables to add to the Beam group when trying to force the AZFP format into SONAR-netCDF4. The AZFP Beam group lacks some of those in the EK60 data because it is a single beam instrument, and there are also simply variables that do not have correspondence in the AZFP file, and vice versa. In my view these are the things we can push for to improve the convention for wider usage.

TODO: add beam_mode to AZFP Beam group.

Is 17082117.nc from OOI? As with the other nc file, Platform global attribute entries are absent (more specifically, the attributes are present but empty, whereas they're absent altogether in the EK60 file).

Yes this file is from OOI. See the previous comment re. setting these values.

leewujung commented 3 years ago

@emiliom : thanks for going through the convention and doing the checks. I think we can compile a list of things to change and questions to be discussed (either for ourselves or with the ICES group) from these back and forth we're having here, and take the FAST meeting opportunity to link up with the ICES group, and ask @ngkavin to help change the code.

emiliom commented 3 years ago

Thanks for the follow ups! I'll digest them hopefully on Monday.

I've compiled the CDL-like tables of attributes, dimensions, coordinate variables, variables, types and subgroups, per group, into a spreadsheet that I've loaded into Google Drive. This is directly from the tables in section 2.10 ("Groups"). Since copying and pasting from the pdf into a true tabular format is very fragile, I went to the first GitHub version (first commit) of the SONAR-netCDF4 document in asciidoc format and pulled out the specific files corresponding to each of the Groups (e.g., tableSonar.adoc). That first version from May 2019 is hopefully version 1.0, or very close (I'll double check). I then transformed the files a bit to be able to paste them into an Excel spreadsheet.

These spreadsheets are now in a form that I can compare directly to Kavin's instrument-specific tables, to do the actual 1:1 compliance assessments for each variable, attribute, etc.

emiliom commented 3 years ago

For echopype/convention_check/DY1801_EK60-D20180211-T164025.nc:

  • Sonar/sonar_model attribute entry is "ER60". I assume that's a mistake, and it should be "EK60"?

That is unexpected! ER60 is the software that runs EK60, see here.

Currently EK60 Sonar/sonar_model is set in convert/set_groups_ek60.py#L87 as self.parser_obj.config_datagram['sounder_name'], so that looks fine, unless the datagram sounder_name is parsed incorrectly.

TODO: Verify that the datagram parsing that populates parser_obj.config_datagram['sounder_name'] is correct. If it currently is leading to a value of "ER60", that's incorrect.

From the spec sheet @ngkavin made the ER60 is in Sonar/sonar_software_version. Though that's also only partially correct, because it should be with the version number also.

TODO: Let's mark this version number to be something to check.

Actually, per the sample nc file, sonar_software_version is being assigned correctly. It's the spec sheet that seems wrong.

TODO: Populate Sonar/sonar_software_name.

Right now it's left blank; see convert/set_groups_ek60.py#L89

Should we create a new issue for these EK60 Sonar global attributes TODOs?

emiliom commented 3 years ago

Based on convention v1.0 those should be in the Platform group global attributes but they are also optional.

True, they're optional. But what we're evaluating here is software compliance against all elements of the convention, including recommended and optional ones. Ideally, the echopype-generated "exemplary" nc files should have all elements populated, including optional ones, to enable this compliance assessment.

These attributes are what users can set when they convert the data. That particular method is working but is also one of what is left to be double checked for completing the convert overhaul. It is surprising that the attributes are absent altogether here compared to AZFP.

TODO: add global attributes to Platform group in EK60

Actually, since those global attributes are defined by the conventional as optional, it's unclear if they should be absent when not populated, vs present but populated with a blank value. I'll have to look more closely at the convention doc. Regardless, the behavior needs to be identical across echosounder instruments.

TODO: the steps to set the platform group should be included in the docs.

Done! You've had that all along, and I just re-organized and tweaked things a bit.

emiliom commented 3 years ago

Regarding this:

TODO: add beam_mode to AZFP Beam group.

Looks like we also need to add beam_type

emiliom commented 3 years ago

From @lsetiawan in #283:

Recently, I came across this: https://github.com/ices-publications/SONAR-netCDF4/blob/master/docs/convention_definition.ncml, which is a data convention definition for SONAR-netCDF4. It would be great for echopype to be able to read this and check against it and get the proper attributes/variables in creating a standardized file.

Great! And sigh :weary: . This weekend I spent a good chunk of time creating an analog of this ncml file by transforming the convention's asciidoc tables in the same directory. See my comment above, https://github.com/OSOceanAcoustics/echopype/issues/210#issuecomment-822196932. I didn't see this ncml file, and that would probably have saved me effort. My goal is to use it for echopype compliance assessment, though initially it's a manual exercise. Anyway, great to have it.

@emiliom Any suggestion on tools to read this ncml file into code?

Brief answer: I did a quick check, and it looks like nothing exists right now (the more relevant packages are old, deprecated ones linked to IOOS). Longer answer: It looks like it's been on the xarray radar (see https://github.com/pydata/xarray/issues/2697), but nothing exists AFAIK. Still, I don't think we're in a position to use such a feature, and my approach and recommendation is that compliance checking be done externally on files generated by echopype. The need for compliance checking tools is a broader SONAR-netCDF4 topic, and there has been some discussion about it at the SONAR-netCDF4 repo. There is also the reality that echopype does not aspire to adhere exactly to the SONAR-netCDF4 v1.0 spec, so we'd have to use a different, echopype-tuned ncml.

I guess you do belong in the sonar-conventions Slack channel after all! :stuck_out_tongue_winking_eye:

lsetiawan commented 3 years ago

Thanks for all of those explanation and your work trying to create an analog of the ncml file! You're such a convention expert! :smile:

My first attempt to handling these conventions is creating a yaml file that somewhat describe the convention in https://raw.githubusercontent.com/OSOceanAcoustics/echopype/e445cda7cec8ea703ec6bcd6205186ee9a463c82/echopype/echodata/convention/1.0.yml. I am currently using that file to dynamically create the attributes in EchoData.

My thoughts on this is possibly using pydantic in the future to describe the standardized EchoData attributes that somewhat adhere to the spec. This would actually help with the validating with minimal effort. But I'm still trying to wrap my head around how to integrate this.

I guess you do belong in the sonar-conventions Slack channel after all!

Haha I am aspiring to be as good with conventions/standards as much as you! :smiley: After all you were the one who trained me in the beginning :stuck_out_tongue:

emiliom commented 3 years ago

From PR #294, regarding calibration and environment parameters used in calibrate.compute_Sv/S:

Right now we are not following any specific convention for the calibrated dataset, so the cal and env parameters are encoded in the form used in echopype.

@emiliom: tagging you here as this is relevant to the convention/processing level discussions going forward.

lsetiawan commented 3 years ago

Following up on my pydantic ideas. I have started echopydantic, where I think we can really define the convention as python data models. https://github.com/lsetiawan/echopydantic

You can see it in action on this development notebook: https://github.com/lsetiawan/echopydantic/blob/main/notebooks/Development.ipynb

Still have to think about the implementation of this, regarding of how this mixes with xarray objects, etc.

emiliom commented 3 years ago

@lsetiawan it's great you're doing that R&D! I don't have time this week to look at your stuff, though. I think your work is somewhat different from the focus of this issue (yeah, I know: saying this issue has a "focus" is a bit laughable). This issue is about assessing current convention compliance and identifying individual desirable/required fixes. Your work with pydantic instead is about developing an overarching implementation approach that has the potential to minimize (but not eliminate!) compliance failures from the get go. If you agree, could you open a new issue and copy and paste into it the relavant comments in this issue?

emiliom commented 3 years ago

@leewujung in the Beam subgroup, the convention defines the mandatory beam_direction_x|y|z variables (x|y|z) as having ping_time and beam dimensions, where beam is effectively frequency. But in the echopype implementation for EK60, the variables do not have ping_time as a dimension. Why is that?

The 3 variables are completely absent in the AZFP exemplar file, but there are so many implementation issues with AZFP's that I'll just set it aside for now.

leewujung commented 3 years ago

@leewujung in the Beam subgroup, the convention defines the mandatory beam_direction_x|y|z variables (x|y|z) as having ping_time and beam dimensions, where beam is effectively frequency. But in the echopype implementation for EK60, the variables do not have ping_time as a dimension. Why is that?

In EK60 these beam direction parameters are parsed from the configuration datagram, and therefore does not change within each file (i.e., the config is identical within each file), hence there is no ping_time dimension. Now that you brought this up, I think we can add a ping_time dimension with a single timestamp from the configuration datagram (this is the same approach as what we did for the EK80 environment datagram -- but I know you don't want to look at EK80 just yet!).

The 3 variables are completely absent in the AZFP exemplar file, but there are so many implementation issues with AZFP's that I'll just set it aside for now.

AZFP data file does not contain these parameters in itself, so I guess in a sense these parameters are more like "metadata" for AZFP data.

emiliom commented 3 years ago

[EK60] Now that you brought this up, I think we can add a ping_time dimension with a single timestamp from the configuration datagram (this is the same approach as what we did for the EK80 environment datagram -- but I know you don't want to look at EK80 just yet!).

(flagging as a TODO) So what you mean is adding a new ping_time dimension and coordinate variable separate from the existing ping_time, right? While that's not compliant strictly speaking, I think it's way better, and 100% adherence to the convention would mean a huge and unnecessary bloat of repeated values! Sounds like another area where the convention is sub-optimal, at least for echosounders.

AZFP data file does not contain these parameters in itself, so I guess in a sense these parameters are more like "metadata" for AZFP data.

Got it. Are they in the XML file? Even if they're not in the data or XML files, I assume this is information that would be important to have and is not otherwise derivable from the existing (meta)data in the echopype-generated AZFP file, right?

leewujung commented 3 years ago

AZFP data file does not contain these parameters in itself, so I guess in a sense these parameters are more like "metadata" for AZFP data.

Got it. Are they in the XML file?

I don't think they are in the XML files.

Even if they're not in the data or XML files, I assume this is information that would be important to have and is not otherwise derivable from the existing (meta)data in the echopype-generated AZFP file, right?

Yes. These are related to the tilt parameters that we discussed last time. Perhaps we should rethink if the tilt params should be in the Platform. I think it depends on what we call as the platform. In the typical situation when an AZFP is moored, with a cage that is vertically inline with the mooring but the transducers are actually mounted with a 15 deg angle from vertical, i guess the tilt params have better correspondence to beam_direction_x|y|z here, though it requires a transformation with strict definition of the origin and axis orientation of the "platform" (the mooring cage) to get to the beam_direction_x|y|z values -- and by doing the transformation, I think then these values are derived values already...

leewujung commented 2 years ago

Since we've been using this to document changes needed wrt the convention, I just found another one:

In the Top-level group, for some reason we have an attribute survey_name that is not present in the convention. I don't remember why I added this in the beginning, but I don't really see it as necessary if the title and summary attributes are properly populated.

emiliom commented 2 years ago

Yes! That was on my list of things to ask about.

gavinmacaulay commented 2 years ago

We considered things like survey_name for SONAR-netCDF4, but in the end stayed with the CF conventions for a more generic set of metadata and also the fact that research metadata like survey name, etc, is covered by the AcMeta convention (which can be inserted into or associated with SONAR-netCDF4 files if desired).

emiliom commented 2 years ago

Thanks @gavinmacaulay !

leewujung commented 2 years ago

In going through #409 I found some inconsistency in how echopype stores info in the Sonar group due to the added flexibility of EK80 -- my memory has faded but it was probably that I made arbitrary decision to store things this way. We should definitely include this in our convention check. 🙂

NOTE: I have yet to tackle the WBT+GPT combination in one of @gavinmacaulay 's ES80 test files es80/WBT-GPT-D20181216-T111107.raw

EK60/ES70

Parsing test_data/ek60/DY1801_EK60-D20180211-T164025.raw I got: Screen Shot 2021-10-18 at 9 35 25 AM

Parsing echopype/test_data/es70/D20151202-T020259.raw I got: Screen Shot 2021-10-18 at 9 34 57 AM

EK80/ES80

Parsing echopype/test_data/ek80/D20190822-T161221.raw I got: Screen Shot 2021-10-18 at 9 31 10 AM

Parsing echopype/test_data/es80/WBT-D20210620-T012250.raw I got: Screen Shot 2021-10-18 at 9 28 49 AM

emiliom commented 2 years ago

This reminds me that my initial decision to set aside EK80 and focus on AZFP and EK60 has inadvertently persisted for way too long! Plus, now I/we need to add AD2CP (and others?) to the mix.

leewujung commented 2 years ago

Ha... I think adding in EK80 in the check would be valuable because EK60 has on its way to be obsolete, but we could probably still set aside AD2CP for the moment (though selfishly I'd really want to have that in!).

emiliom commented 2 years ago

I think we should close this sprawling issue. Much of its content has been captured in https://github.com/uw-echospace/standardization/blob/main/echopype_sonarnetcdf4_compliance.md. We can move other content from this issue to that document, or to new documents and issues on https://github.com/uw-echospace/standardization, as needed. echopype_sonarnetcdf4_compliance.md will retain a link to this issue to help us dig out additional details still present here.

We'll open specific, narrow issues as needed.

leewujung commented 2 years ago

Sounds great!

emiliom commented 2 years ago

Great. I'll close the issue now, and we'll just keep mining it as needed. There are lots of important details here that we'll refer back to.