OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Move Beam group to be a Sonar subgroup #519

Closed emiliom closed 2 years ago

emiliom commented 2 years ago

Move Beam group from the current top level to one (or more?) Sonar subgroups, to adhere to the convention. Look up our discussion of Beam groups in the echopype paper to remember whether we see a possible need to ever have more than one Beam group.

leewujung commented 2 years ago

Note: also consider #490 which is related to having >1 transducer/channel of the same frequency in the echosounder setup.

emiliom commented 2 years ago

SONAR-netCDF4 v1.0 and echopype implementation

Sonar beam sub-groups are discussed in the SONAR-netCDF4 v1.0 report in section 2.10.6, page 14:

Data from each beam mode (e.g. horizontal and vertical beam modes) are stored in subgroups under the /Sonar group (see Table 8). The form of the backscatter data can vary between different sonar systems. For example, some provide a complex-valued amplitude, while others provide a real- or integer-valued amplitude. Variable definitions for data from split-aperture systems are not currently specified.

Include as many subgroups as necessary for different beam groups. Use unique group names, preferably of the form Beam_groupX where X is an integer.

A critical, distinguishing global attribute for each Beam group is beam_mode:

Mode of the beam in this sub-group, taken from the defined vocabulary of: “vertical” (a set of beams that form a vertical slice through the water), ”horizontal” (a set of beams that form a nominally horizontal plane through the water), and ”inspection” (a set of beams with arbitrary pointing directions)

A Beam group may have more than one beam, discriminated through the beam dimension.

In the echopype preprint, p. 6, we state the following as our interpretation of SONAR-netCDF4 v1.0 regarding Beam groups:

each frequency channel is stored in a separate netCDF4 group (Beam_group1, Beam_group2, ...).... Values from each frequency channel, stored in separate beam groups in the convention, are mapped along the new frequency dimension

We need to confirm or refine this understanding.

emiliom commented 2 years ago

SONAR-netCDF4 Version 2 draft

The rendered (updated daily, I think) version of the SONAR-netCDF4 Version 2 draft includes some changes to the Beam group. They are partly summarized in 7.1. Significant changes from version 1 to version 2.

The new Sonar group hierarchy shown below is constructed from what's in 3.2. Hierarchical structure and in 3.10.6. Proposed ADCP additions:

Sonar – contains the main data models in the convention; groups under Sonar are used for storing the data output from the sonars, as well as processed sonar data and interpretation masks;

Here is the new description of the Beam group:

This group contains ping based sonar backscatter data and associated metadata and is described in Table 10. The netCDF4 group name is Beam_groupX, where X is an integer.

Data from each beam mode (e.g. horizontal and vertical beam modes) are stored in subgroups under the /Sonar group (see Table 11). The form of the backscatter data can vary between different sonar systems. For example, some provide a complex-valued amplitude, while others provide a real- or integer-valued amplitude. Variable definitions for data from split-aperture systems are not currently specified.

Subgroups under /Sonar each have a coordinate variable that contains ping timestamps. In some cases the coordinate variables in different subgroups contain the same data (such as when a sonar produces several types of beam data from each and every ping). To avoid duplication of timestamp data, a coordinate variable can be used across multiple subgroups. For organisational reasons, it is then recommended that such coordinate variables be located in the /Sonar group.

emiliom commented 2 years ago

In #545 I've implemented a draft version of the Beam group move to the Sonar/Beam subgroup .

But there's a larger question we'll need to address (see above,https://github.com/OSOceanAcoustics/echopype/issues/519#issuecomment-1023573467), of whether our interpretation of SONAR-netCDF4 v1.0 of Beam groups as mappable to the frequency coordinate in echopype is correct. Plus double checking how changes proposed in v2.0 impact our use of the Beam group

emiliom commented 2 years ago

Just realized that we should also address the Beam_power group. It's used only with EK80. But I don't know what it is or how it maps to the v1.0 convention. Will need to discuss.

leewujung commented 2 years ago

@emiliom : the Beam_power group was my ad-hoc (!) creation to accommodate EK80 datasets that contain both complex time series samples (from BB or CW mode) and power/angle samples (from CW mode). This is because the number of samples are drastically different (complex samples are much large as they are sampled at a much higher freq), and the complex samples also have an extra dimension (currently that is called quadrant, which can be viewed as just as a separate transducer, or a "beam").

Thinking about the Beam_groupX setup, it seems that it would work to just have the power/angle samples stored as one o the beam groups, in parallel with the complex samples. We would need to figure out how to index these.

Transmit signal type: BB - broadband; CW - continuous wave (narrowband)

emiliom commented 2 years ago

the Beam_power group was my ad-hoc (!) creation to accommodate EK80 datasets that contain both complex time series samples (from BB or CW mode) and power/angle samples (from CW mode). This is because the number of samples are drastically different (complex samples are much large as they are sampled at a much higher freq), and the complex samples also have an extra dimension (currently that is called quadrant, which can be viewed as just as a separate transducer, or a "beam").

The second part, the extra dimension (quadrant, which as we've discussed will be renamed to beam) is not an issue per se, since we'll be adding a beam dimension to the Beam group that hasn't had it b/c it was implicit. Right? And the issue of number of samples being drastically different bears resemblance to similar cases we've discussed; in that case the core challenge is that our regular gridded structure makes this sparse situation very inefficient, right? But is there ultimately a conceptual, clear reason, based on the SONAR-netCDF4 definition of a Beam group, why the power/angle samples and the complex samples should go in separate groups?

Thinking about the Beam_groupX setup, it seems that it would work to just have the power/angle samples stored as one o the beam groups, in parallel with the complex samples. We would need to figure out how to index these.

What do you mean by "how to index these"?

leewujung commented 2 years ago

is there ultimately a conceptual, clear reason, based on the SONAR-netCDF4 definition of a Beam group, why the power/angle samples and the complex samples should go in separate groups?

By "separate groups", do you mean to put data collected by the same frequency channel but under different settings (CW with power/angle samples, CW in complex samples, BB in complex samples) in the same group? I think that will cause memory problem unless we use sparse representation to circumvent the NaN padding. Also, as of now, Power samples live in backscatter_r only and complex samples live in both backscatter_r and backscatter_i. If we put all of them in the same group and only interleaved by ping_time, it would be hard to slice.

Thinking about the Beam_groupX setup, it seems that it would work to just have the power/angle samples stored as one o the beam groups, in parallel with the complex samples. We would need to figure out how to index these.

What do you mean by "how to index these"?

I was referring to how to tell which Beam_groupX contains what type of data. For example, what determines the ordering 1, 2, 3, ... ? And if we use different Beam_groupX to store separately complex and power/angle data that were collected by the same channel at different time with different settings, the 1, 2, 3, ... does not give use a simple way to figure out which group contains what. It seems counter productive if we need to load the group before can tell whether it is what we need.

emiliom commented 2 years ago

Just a note to emphasize that #567 is also central to this issue.

emiliom commented 2 years ago

In PR #574 I noted: "This is only a first, incremental step to fully supporting a flexible Sonar/Beam_groupX multi-beam-group approach. Some pre-existing hard wiring of beam group names should ultimately be generalized." Much of this hard-wiring involves the existing assumption that there can only be two Beam groups, where one of them, ed.beam, is the general-purpose singular group and the other, ed.beam_power, only applies to EK80 (as a second group) and has a specific meaning (for backscatter data, beam-power only).

I don't know if in practice there'll ever be a case with 3 beam groups, but we should build a flexible, generic mechanism that doesn't hard-wire the total number of possible groups and doesn't generically constrain the scope of each group. I think we should have a mechanism where the number of beam groups and scope of each group (basically, a description string) is specified and stored in sensor-specific code. It could probably be encoded in the set_groups_<sensor>.py modules and stored or readily available in the EchoData object.

An immediate challenge will be that this hard-wiring currently starts in the explicit listing of groups in 1.0.yml and its usage in EchoData.__setup_groups(). We'll need to refactor this pattern so that the number of groups is not universally fixed in a single place.

leewujung commented 2 years ago

@emiliom : I don't fully understand some of your comments above, but on this part:

if in practice there'll ever be a case with 3 beam groups

I think it depends on what Simrad allows or not in the future. This came up in our conversation with @gavinmacaulay also, about whether changing the type of the samples being collected forced the system to stop the current file and start a new one.

We know based on the test files we have that it is possible to have 1 .raw file containing both the CW-power/angle data and BB-complex data. These go into the ed.beam_power and ed.beam currently, which is the hard-wiring we want to change. If we go to using the ed["Sonar/Beam_groupX"] pattern and deprecate the current ed.sonar access pattern then the number won't be limited.

About the access pattern: my understanding from our last group discussion was that we would use open_datatree to open the file, which then allows using something like ed["Sonar"] (with or without .ds) to access data. Though looking at this again now I don't remember what we said about how to handle this coexisting with ed.sonar? @lsetiawan @b-reyes @imranmaj help!

imranmaj commented 2 years ago

We can change ed.sonar to call ed["Sonar"] under the hood before we deprecate it

leewujung commented 2 years ago

@imranmaj : so this would be a breaking change then?

Hmm, some memory is coming back to me now. Is this correct that we decided to take this in 2 steps?

  1. keep the current ed.sonar, ed.beam access pattern but under the hood generate files and use the correct hierarchy Sonar/Beam_groupX
  2. deprecate and switch to use ed["Sonar"], ed["Sonar/Beam_groupX"] everywhere
emiliom commented 2 years ago

Thanks @leewujung and @imranmaj! Sorry if my comments weren't very clear. I should clarify that those comments are not focused on the ed.<group> vs ed["<group>"] mappings. For that, I think the discussion should take place in #567. My comments, and PR #574, are focused on what @leewujung described as step 1 in her comment above (a minute ago). Of course, these things are all interrelated.

So, for my focus, the hard-wired beam group limit I'm referring to (and that I've partially addressed in #574) concerns especially this:

An immediate challenge will be that this hard-wiring currently starts in the explicit listing of groups in 1.0.yml and its usage in EchoData.__setup_groups(). We'll need to refactor this pattern so that the number of groups is not universally fixed in a single place.

It's all about how the xarray Sonar/Beam_GroupX datasets are created (and also the new variables in the Sonar group that describe the Beam_GroupX groups); also, later on, how the Sonar/Beam_GroupX group description text in the EchoData repr is assigned.

leewujung commented 2 years ago

Thanks for the clarification @emiliom ! Sorry I was confused! I will now work on your #574 .

emiliom commented 2 years ago

I think we'll be able to close this after PR #611 is merged

emiliom commented 2 years ago

611 has been merged, so we can close this issue. Woo-hoo!