Closed leewujung closed 2 years ago
Thoughts on top of my head:
For 1 -- channel
may work well as the name of the dimension/coordinate.
For 2 -- We can add a coordinate nominal_frequency
along the same dimension to help with the data selection/slicing need.
For 1: channel is the term most often used for this. For 2: the most generic label for this dimension would be a unique channel identification string or channel id number. I imagine you could retain a nominal frequency 'coordinate', which then wouldn't need to be unique across channels.
For 2 -- We can add a coordinate
nominal_frequency
along the same dimension to help with the data selection/slicing need.
Would that be a coordinate, or a regular "data" variable with channel
dimension, but not a coordinate?
For 2 -- We can add a coordinate nominal_frequency along the same dimension to help with the data selection/slicing need.
Would that be a coordinate, or a regular "data" variable with channel dimension, but not a coordinate?
Oh that's true. If it's a coordinate we'll run into the same non-monotonic/duplicate problem again. So, making it a data variable seems would work?
There are also multiple frequency data variables so along with this let's revisit them based on the conversation with @gavinmacaulay.
For 2: the most generic label for this dimension would be a unique channel identification string or channel id number.
Right now we have a channel_id data variable in converted EK80 files to select the right set of filter coefficients. I don't remember on top of my head whether this data variable also exist in converted EK60 files, but probably not; and definitely not in AZFP or AD2CP files.
It seems a good idea for us to consider adding this as part of the standard set of data variables for all sonar models.
Here's the consensus I see emerging about this, plus my own thoughts:
frequency
: channel
channel
:
long_name
: "Vendor channel ID"nominal_frequency
frequency_nominal
: nominal
is used only as a suffix in the v1 conventionfrequency
: already used as a dimension in the Environment
groupBeam.frequency
coordinate and as described in the Environment.frequency
coordinate in the conventionThis does mean that slicing on frequency would not be as convenient as it is now in echopype. But it sounds like this is unavoidable, given that the same frequency can occur in >1 channel.
@leewujung have you had a chance to think about this?
@emiliom : thanks for pinging, this one somehow slipped through my inbox.
I'll do point-by-point response so that we can arrive at a conclusion here.
- New dimension name to replace
frequency
:channel
Agree
- Corresponding coordinate:
channel
:
- Type: string, for maximum flexibility
- attributes:
long_name
: "Vendor channel ID"
Agree
- Associated data variable:
nominal_frequency
- Alternative names, for greater consistency with v1 convention:
frequency_nominal
:nominal
is used only as a suffix in the v1 conventionfrequency
: already used as a dimension in theEnvironment
group
I vote for frequency_nominal
for consistency with v1 convention.
Also, your mention of Environment
makes me realize that this change will also impact other groups that share the same dimension. In that way actually channel
is better, but for completeness it would be good to add frequency_nominal
as well -- I am thinking about the case for absorption, people may be confused by there are different values but the presence of different frequencies makes it self-explanatory.
- Type: float
- attributes: same as currently used by echopype in the
Beam.frequency
coordinate and as described in theEnvironment.frequency
coordinate in the convention
Agree
In terms of what should be in channel
:
channel_id
variable, which contains strings like "GPT 18 kHz 009072034d45 1-1 ES18-11"
, "WBT 545603-15 120-7C"
."55075"
) and the channel frequency.
I haven't seen any files with duplicated frequencies, but in theory that is also possible. Since there's no unique identifier I can find in the file (I could check with manufacturer), we could probably do "55075-125-1"
, "55075-125-2"
, ..., where 125
is the frequency in kHz (that's the unit they use in the associated XML file), and 1
, 2
, etc is the sequence number in "Frequencies" in the XML file (it is sequential, so nice that way).Let's clarify what name we're proposing for the channel dimension and coordinate. I had suggested channel
, and @leewujung I think you had agreed. In your subsequent comment, you brought up the channel_id
variable as the source of the values for the channel
coordinate for EK60 and EK80.
But we're still in agreement that the name of the dimension and coordinate will be channel
, right? I'm double checking because of what you stated here vis a vis channel_id
.
Yeah, I think i was confused in that comment since channel_id
is specific for EK80. Let's use channel
for this to be more general.
Great
Tasks (updated)
In the Beam_groupX
groups:
channel
dimension and coordinate variable
frequency
dimension and coordinate variable to channel
long_name
: "Vendor channel ID"channel_id
variable, which contains unique strings like "GPT 18 kHz 009072034d45 1-1 ES18-11"
, "WBT 545603-15 120-7C"
.55075
is the serial number, 125
is the frequency in kHz (that's the unit they use in the associated XML file), and 1
, 2
, etc is the sequence number in "Frequencies" in the XML file (it is sequential, so nice that way).float64
data-type variable, frequency_nominal(channel)
, to contain the frequency value. It can be populated the same way the current frequency
coordinate variable is populated. Its attributes should be the same as in the current frequency
coordinate variable.@leewujung recommended a phased approach, where I think the first step is just renaming frequency
to frequency_nominal
. @leewujung can you flesh this out?
A side note: frequency
is also used in other groups. I assume we don't need to change any of those? I haven't looked closely.
@emiliom thank you for putting this together. I am starting to work on this.
A side note: frequency is also used in other groups. I assume we don't need to change any of those? I haven't looked closely.
Here is where frequency
occurs in each sensor (not including the beam group)
absorption_indicative
, sound_speed_indicative
pitch
, roll
, vertical_offset
, water_level
, transducer_offset_x/y/z
, MRU_offset_x/y/z
, MRU_rotation_x/y/z
, position_offset_x/y/z
sa_correction
, gain_correction
, pulse_length
sa_correction
, gain_correction
, pulse_length
water_level
, transducer_offset_x/y/z
, MRU_offset_x/y/z
, MRU_rotation_x/y/z
, position_offset_x/y/z
XML_transmit_duration_nominal
, XML_gain_correction
, XML_digitization_rate
, XML_lockout_index
, digitization_rate
, lockout_index
, number_of_bins_per_channel
, number_of_samples_per_average_bin
, board_number
, data_type
frequency
as a dimension: DS
, EL
, TVR
, VTX
, Sv_offset
, number_of_samples_digitized_per_pings
, number_of_digitized_samples_averaged_per_pings
frequency
as a dimension in any group@emiliom for your comment above could you just link to my https://github.com/OSOceanAcoustics/echopype/issues/566#issuecomment-1067564872 to avoid confusion? (because I think what you listed was not accurate) -- thanks.
channel_id
is present and unique, so let's use that for values in this new coordinate variable channel
55075
is the serial number, 125
is the frequency in kHz (that's the unit they use in the associated XML file), and 1
, 2
, etc is the sequence number in "Frequencies" in the XML file (it is sequential, so nice that way).@leewujung recommended a phased approach, where I think the first step is just renaming
frequency
tofrequency_nominal
. @leewujung can you flesh this out?
Below is what I was thinking, see what you guys think:
channel
coordinate as we intend as the final form and rename the current frequency
to frequency_nominal
(many tests will break)frequency_nominal
as a coordinate variable (in 1. it is a data variable)swap_dims({"channel": "frequency_nominal"})
so that all variables are aligned to frequency_nominal
--> this will allow the .sel
and .isel
slicing operationsfrequency
by frequency_nominal
in all occurrences of the old slicing .sel(frequency=X)
or .isel(frequency=X)
(tests in theory should pass -- this is to assure us that there has been no mechanistic change of the code, just a change of variable name)swap_dims
in 4. and replace the slicing to be using channel
(tests should continue to pass)assign_coord
in 2 so that the datasets are in their final form (tests should continue to pass)5-6. involve some reversion, so this may not be faster. I personally feel this may be more error proof though, but @b-reyes see what you think because you'll be the person implementing these changes. @emiliom please do chime in on what you think!
@emiliom for your comment above could you just link to my https://github.com/OSOceanAcoustics/echopype/issues/566#issuecomment-1067564872 to avoid confusion? (because I think what you listed was not accurate) -- thanks.
I was trying to have all the instructions in one place. Thanks for catching my mistake. I got confused by your comment "Yeah, I think i was confused in that comment since channel_id is specific for EK80.". Anyway, I'll update my summary and also add a link to the source comment.
We don't need to change anything in the Vendor
group, since that falls outside the scope of the convention.
To get a better sense of things, I am trying to understand the slicing operation. I believe there is some mention of it above, but I wanted to have it said explicitly here. Currently, we do .sel(frequency=X)
or .isel(frequency=X)
. Once we change the name frequency
--> channel
and create the data variable frequency_nominal(channel)
from frequency
, what will this operation look like? Is it just .sel(channel=X)
and .isel(channel=X)
?
@b-reyes : I think you can look into the code to get that sense, and propose other solutions if needed; there may also be places where this slicing can be eliminated. :)
While renaming the current frequency
dimension in the Beam groups to channel
, it was found that this creates conflicts within the calibration methods (see _get_gain_for_complex). To rectify these conflicts, I suggest that we also rename frequency
in the Vendor
group to channel
. If this change is made, I also suggest that we add frequency_nominal
to the Vendor
group so that one can reference the frequencies.
@leewujung @emiliom
I also suggest that we add frequency_nominal to the Vendor group so that one can reference the frequencies.
I think this is a good idea.
Similar to Vendor, the Environment group variable ~sound_indicative
~ sound_speed_indicative
is connected to the beam variables because it is used to calculate range_meter. For this reason, I believe we need to also change it's dimension frequency
to channel
and add the variable frequency_nominal
.
Unfortunately, absorption_indicative
(also in the Environment group) is in the convention and it is required that it have a dimension of frequency
. How do we handle this clashing of Echopype and the convention?
edited: changed sound_indicative
to sound_speed_indicative
@b-reyes : I am not sure what is sound_indicative
? If you are talking about sound speed, it is not frequency dependent in our context, but depending on where the data comes from it can be with different frequency channels in EK60 -- please check that. I keep on stressing where data comes from because that tells you why things are a certain way for EK60 vs EK80 difference, so please check those.
I don't believe we would have absorption_indicative
anywhere that does not have a dimension already align with frequency
. Absorption is frequency dependent and either comes from the data that way or will be computed that way, so there should not be any clash.
Also, just as an overarching thing, it probably helps to be explicit here that I think we should change everything that had frequency
as a dimension before to be aligned with channel
. Again most of those came from the instrument-generated files, so check the set_groups to make sure why different instruments may have difference. All these things are connected either from how the instrument stores the file or physics, so it does not make sense to me if we only change things in the Beam_groupX group and not others.
I am not sure what is sound_indicative ?
@leewujung yes, I meant sound_speed_indicative
, sorry about that, please see my edit above.
I think it is fine to change the dimension frequency
to channel
for sound_speed_indicative
as it does not have any dimensions in the convention. For EK60 sound_speed_indicative
currently has the dimensions (frequency, ping_time)
and is (ping_time)
for EK80.
I agree that it would make sense to change frequency
to channel
in all groups. However, I have specifically mentioned the Environment group and absorption_indicative
because in the convention this variable has dimension (frequency)
. Based on our past discussions, we had mentioned that we should not take away dimensions from the convention, but we can add them. Using this logic, if we change frequency
to channel
, then we would have to let absorption_indicative
have the dimensions (channel, frequency, ping_time)
for EK60. I know this probably doesn't make sense, but I wanted to bring it up. If we are going to intentionally disregard the convention, then I think the reasoning for that should be openly discussed. For the record, I don't like absorption_indicative
having the dimensions (channel, frequency)
.
Note that in the convention the frequency dimension specified for absorption_indicative
has no restrictions on it and is not required to have just the frequencies of the acoustic data channels in the file. This was to allow the storing of an absorption/frequency curve that could be interpolated against to derive absorptions for particular echosounder frequencies (or frequency bands for broadband data).
A related note is that the convention was not designed to store, without loss, data files from existing sonar systems. It was rather intended to specify the minimum required data and metadata to allow derivation of quantitative Sv and TS. A compromise with this is that in some cases this prevents/hinders the creation of a lossless replica of a sonar file.
Conclusions from meeting just now:
frequency_nominal
should be included in all groupsfrequency_nominal
can be different for different Beam_groupX
@b-reyes could you confirm that the changes that will happen within Beam groups are now fully clear? My impression is that there are no open questions there.
AD2CP does not currently use a frequency
dimension in any group other than Beam_group1
; no changes will be applied to groups other than Beam_group1
. For all other instruments:
Vendor
groupfrequency
to channel
wherever it occurs, and add frequency_nominal
Beam_group1
variables will be moved in AZFP: DS
, EL
, TVR
, VTX
, Sv_offset
, number_of_samples_digitized_per_pings
, number_of_digitized_samples_averaged_per_pings
.Environment
grouptime1
. See #656. frequency
to channel
wherever it occurs, and add frequency_nominal
absorption_indicative(channel, time1)
and sound_speed_indicative(time1)
. @leewujung please confirm the latter; I'm not sure if we agreed to retain a (channel, time1) dimensionality instead, as currently done in EK60 under different dimension namesPlatform
groupcomment
attribute for time3
.frequency
to channel
wherever it occurs, and add frequency_nominal
NOTE: Some of these changes should be done as individual PR's. For example, the move of AZFP variables to the Vendor group
- dimensionality of convention-defined variables:
absorption_indicative(channel, time1)
andsound_speed_indicative(time1)
. @leewujung please confirm the latter; I'm not sure if we agreed to retain a (channel, time1) dimensionality instead, as currently done in EK60 under different dimension names
I think we can safely "squeeze" the channel
dimension out for EK60. Perhaps we can put in a check to make sure all the values are the same when doing that, and spit out an error if the values across the frequency channels are not the same.
@b-reyes could you confirm that the changes that will happen within Beam groups are now fully clear? My impression is that there are no open questions there.
Yes, all changes for the beam groups are clear (please see the below clarification).
AD2CP does not currently use a
frequency
dimension in any group other thanBeam_group1
; no changes will be applied to groups other thanBeam_group1
. For all other instruments:
As previously mentioned, frequency
should not appear anywhere within AD2CP. Not even Beam_group1
has the frequency
dimension.
- dimensionality of convention-defined variables:
absorption_indicative(channel, time1)
andsound_speed_indicative(time1)
. @leewujung please confirm the latter; I'm not sure if we agreed to retain a (channel, time1) dimensionality instead, as currently done in EK60 under different dimension namesI think we can safely "squeeze" the
channel
dimension out for EK60. Perhaps we can put in a check to make sure all the values are the same when doing that, and spit out an error if the values across the frequency channels are not the same.
@emiliom as you noted, several of these items should be done in different PRs/issues within this issue, I suggest we not change the the number of dimensions of sound_speed_indicative
for EK60 in this issue. I think we should just change frequency
--> channel
for sound_speed_indicative
for EK60 giving sound_speed_indicative(channel, ping_time)
. Then deal with other changes once this issue has been dealt with. I think we are already off topic as the title of this issue is "Rename the current frequency
dimension in the Beam
groups".
I suggest we not change the the number of dimensions of
sound_speed_indicative
for EK60 in this issue. I think we should just changefrequency
-->channel
forsound_speed_indicative
for EK60 givingsound_speed_indicative(channel, ping_time)
. Then deal with other changes once this issue has been dealt with. I think we are already off topic as the title of this issue is "Rename the currentfrequency
dimension in theBeam
groups".
This issue does cover a lot of ground, beyond the original title! But I thought you wanted to see a clear list of the changes that should occur -- our targets. That's what I was aiming for. I believe we're already including changes to dimensions in order to bring greater alignment across instruments, whenever possible. For example, as you noted earlier, EK60 and EK80 currently differ: "For EK60 sound_speed_indicative
currently has the dimensions (frequency, ping_time)
and is (ping_time)
for EK80."
What sequence of changes/PR's is used to accomplish these targets is a different ball of wax. eg, https://github.com/OSOceanAcoustics/echopype/issues/566#issuecomment-1109034399
This issue does cover a lot of ground, beyond the original title! But I thought you wanted to see a clear list of the changes that should occur -- our targets. That's what I was aiming for.
@emiliom I found your list very helpful. Do you think it would be better to have this as its own issue and have each of those items as tasks? I was thinking we could do something similar to #520.
@emiliom I found your list very helpful.
Great
Do you think it would be better to have this as its own issue and have each of those items as tasks? I was thinking we could do something similar to https://github.com/OSOceanAcoustics/echopype/issues/520.
If by "this" you mean that list, I think I lean towards keeping it in this same issue. That is, feel free to edit my list in place to turn it into tasks. Rework it a bit if you think there's a way to organize it that lends itself better to tasks and PRs.
@leewujung I don't know why #663 was being tracked in this issue. #663 will be closed soon anyways, but I think we can close this issue.
I think it is because the frequency
to channel
will impact what's being selected to plot for the DVM. And yes we can close this now!
Currently we use
frequency
as a dimension in the echopype-generated dataset (in place of having data from different frequency channels saved into differentBeam_groupX
s defined in the SONAR-netCDF4 convention ver. 1.0).However,
frequency
can be a misnomer and is probably better names aschannel
, because for broadband echosounders like the EK80, there are a sweep of frequencies in one channel. Usingfrequency
with numerical datatype can also be problematic, such as exposed in #490 where there are more than one transducers at the same frequency but share the same multiplexed transceiver. The good thing is that we can easily slice/select data based on the numerical number that is the nominal frequency of that channel.So, there are 2 questions, which we have to decide now since this would cause breaking changes and would best be in v0.6.0:
Seeking comments from @emiliom @lsetiawan @b-reyes @gavinmacaulay 🙏
Updates: tracking issues/PRs