OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Include additional environment variables stored in EK80 files to converted dataset #540

Closed leewujung closed 2 years ago

leewujung commented 2 years ago

EK80 files contain a lot more environmental variables in addition to the 5 ("temperature", "depth", "acidity", "salinity", "sound_speed") that we currently save in the converted dataset. For example, the current test file D20170912-T234910.raw contains the following: Screen Shot 2022-01-22 at 4 54 50 AM

We should add those that are of the nature of environmental variables (such as "sound_velocity_profile", "sound_velocity_source") to the output variables into the Environment group, and others, such as "water_level_draft" and "water_lebel_draft_is_manual" into the Platform group, so on and so forth.

I am not sure why there is a "latitude" variable in this...

This is related to #531 .

emiliom commented 2 years ago

Is that additional information currently not retained somewhere, even if in unparsed form? In other words, if you convert an EK80 raw file, does the resulting EchoData object contain that information anywhere?

gavinmacaulay commented 2 years ago

I am not sure why there is a "latitude" variable in this...

Latitude is in the environment variables because it is an input to the sound speed equations. and presumably Kongsberg is including all input parameters that they used when calculating the sound speed.

leewujung commented 2 years ago

Latitude is in the environment variables because it is an input to the sound speed equations. and presumably Kongsberg is including all input parameters that they used when calculating the sound speed.

Ah I see, somehow I didn't make that immediate connection. So Kongsberg is probably using the new one for sound speed calculation in EK80 then, since in EK60 latitude is not included. We should include that in utils.uwa so things are matching. Thanks @gavinmacaulay !

leewujung commented 2 years ago

Is that additional information currently not retained somewhere, even if in unparsed form? In other words, if you convert an EK80 raw file, does the resulting EchoData object contain that information anywhere?

The additional information is currently not retained anywhere.

Another piece of info from the SONAR-netCDF4 convention p.7 as we move to include these:

Screen Shot 2022-01-22 at 6 10 26 PM
leewujung commented 2 years ago

@gavinmacaulay : More questions as I put together an issue for adding a new option in sound speed calculation (Leroy et al 2008 JASA). Do you know if Kongsberg actually uses the latest equation on sound speed from Leroy et al. 2008, or they use the location info and depth to produce pressure, and then use the Mackenzie equation (or some other equation)?

gavinmacaulay commented 2 years ago

The EK80 manual says that it uses the 'UNSECO' algorithm and then gives these references for seawater:

leewujung commented 2 years ago

@imranmaj : along with your work on #592, could you also take this on to add the variables that are in the EK80 raw data but not currently stored in the Environment group? Thanks!

imranmaj commented 2 years ago

File D20170912-T234910.raw has the following sound_velocity_profile:

image

This could be pairs of depth, sound speed

leewujung commented 2 years ago

Yes this looks like the (depth, sound speed) pair.

@emiliom : any pointers on if we name the first dimension as depth and the units (here it is most likely in meters) and other attributes?

emiliom commented 2 years ago

SONAR-netCDF4 v1 recommends the use of the "NCEI NetCDF 'profile' template, v2.0 or greater", for environmental profile data. The NCEI netcdf templates are here. See the profile and maybe the timeSeriesProfile feature types.

See the sample profile CDL file for more details.

any pointers on if we name the first dimension as depth

The variable name is up to the user, though conventional names should be used (z, depth, etc). What matters is the attributes -- standard_name, axis, etc.

and the units (here it is most likely in meters) and other attributes?

Units in meters would be perfect. The CF requirement is UDUNITS strings, with a few exceptions. For other attributes, see the CDL. But no need to adopt anywhere near the entire suite of global attributes listed!! The NCEI netcdf templates are a gold standard for data archiving. We should take baby steps here.

leewujung commented 2 years ago

Thanks @emiliom !

The variable name is up to the user, though conventional names should be used (z, depth, etc). What matters is the attributes -- standard_name, axis, etc.

Good to know this rule of thumb! 😀

The NCEI netcdf templates are a gold standard for data archiving. We should take baby steps here.

Great -- that template is intimidating...

emiliom commented 2 years ago

Great -- that template is intimidating...

It's strict CF + ACDD + some NCEI additions

leewujung commented 2 years ago

@emiliom : I tried to make up something below, please chime in on any suggestions/corrections! I used the sound_speed_indicative attributes for the variable, and added some info in the comment.

imranmaj commented 2 years ago

The environment group already has a depth variable; what other name should be used? Does sound_velocity_profile_depth work?

leewujung commented 2 years ago

@imranmaj : I think sound_velocity_profile_depth works. @emiliom : any thoughts on the above other attributes?

emiliom commented 2 years ago

Side note:

EK80 files contain a lot more environmental variables in addition to the 5 ("temperature", "depth", "acidity", "salinity", "sound_speed") that we currently save in the converted dataset.

None of those existing variables have any attributes assigned (eg, units). That should be an easy addition. The task could be tracked via a new, separate issue.

emiliom commented 2 years ago

Regarding the attributes -- and structure -- of the new profile data (finally got to it! :disappointed:): the attributes and names look good. But what about the time dimension? Is each (depth, sound_speed) profile associated with a single time stamp? If so, is there a profile for each ping_time value? In other words, is sound_velocity_profile actually a function of both depth and ping_time? Are the depth bins identical across a raw file? If those conditions are met, we're dealing with a timeSeriesProfile CF feature type rather than a simple profile.

A couple of questions or issues come to mind too if we were to think of the Environment data from a strict CF perspective (@imranmaj : I'm thinking out loud here!). If you go to the trouble of following a CF feature type structure, typically you'd want to label it as such via dedicated global attributes, especially Conventions and featureType. But CF specifies that a data file can only contain data for one feature type, whereas this Environment group would hold both a profile (or timeSeriesProfile) variable and several timeSeries variables that are only a function of ping_time (temperature", "depth", "acidity", "salinity", "sound_speed"). That would violate CF. Hmm.

leewujung commented 2 years ago

Is each (depth, sound_speed) profile associated with a single time stamp?

Currently in the raw data there is no time dimension in the sound speed profile. I don't know how this profile is generated in EK80, or whether it is input by user. If the latter, it is likely entered/imported when setting up the instrument. In this situation, we can potentially run into the situation where sound_velocity_profile is a function of both depth and ping_time when multiple EK80 files with different sound velocity profiles are combined together using combine_echodata. If we want to accommodate this case, maybe we can add a time coordinate and use the first ping_time for the timestamp of the profile.

As I am typing I am just realizing that this timestamp issue could occur for other environmental variables (temperature", "depth", "acidity", "salinity", "sound_speed") too. The difference here between EK60 and EK80 is that:

Ugh...

I am not sure what to comment for the CF violation part.

emiliom commented 2 years ago

Currently in the raw data there is no time dimension in the sound speed profile. I don't know how this profile is generated in EK80, or whether it is input by user.

Ah. Just so I'm 100% clear, there's just one profile per raw file?

gavinmacaulay commented 2 years ago

Currently in the raw data there is no time dimension in the sound speed profile.

There will be a time available from the .raw file datagram timestamp. If the profile data comes from a previously entered profile, the timestamp will be pretty much the time when that file was created. The EK80 (and EAxx) software accepts sound speed profiles sent automatically from other instruments, which can arrive at any time. I suggest allowing for a time coordinate on the profiles, noting that that time will not be a ping_time. There can be multiple profiles per raw file.

leewujung commented 2 years ago

Oh that's true that there is a timestamp associated with that particular datagram, even if not the profile, we can use that!

leewujung commented 2 years ago

EK60: environment variables are in the "sample datagram" so naturally has a ping_time dimension.

EK80: environment variables are in separate environment datagram so can have different timestamps compared to the acoustic data datagrams

imranmaj commented 2 years ago
emiliom commented 2 years ago

A small clarification: transducer_sound_speed and transducer_name are only found in EK80 (or at the very least, they're not found in EK60, nor are they from the convention). That "clarification" wasn't strictly necessary, because this issue is focused on EK80.

leewujung commented 2 years ago

Added in #616, closing this now 🚀