OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

Rename frequency_start/end to transmit_frequency_start/stop #1081

Closed emiliom closed 1 year ago

emiliom commented 1 year ago

Beam_groupX frequency_start and frequency_end are only used by EK80. Rename them to the equivalent v1 convention names, transmit_frequency_start and transmit_frequency_stop

emiliom commented 1 year ago

I've examined the current usage of frequency_start and frequency_end and the definitions of the corresponding transmit_frequency_start and transmit_frequency_stop in the convention. I have some observations and questions.

In the convention (both v1 and v2), the two variables are defined as Mandatory. In echopype, they are only used (present) with EK80, and within EK80 only for BB data and apparently assigned valid values only for a subset of channels. For example, in one EK80 file I've been examining (ek80/ek80_bb_with_calibration/2018115-D20181213-T094600.raw), the frequency_start/end variables are assigned valid values only in 2 of the 6 channels, while the other 4 are assigned nan.

My interpretation is that these variables are simply unnecessary (redundant) for data other than BB data (though I don't understand the case I described above). In other words, for EK60, AZFP and EK80 power data, one could set transmit_frequency_start = transmit_frequency_stop = frequency_nominal.

If that's correct, I recommend always creating these variables. That would be more consistent with the convention, since the variables are mandatory (their description is: "Frequency at the start/end of the transmit pulse."). If we do this, there'll be an additional change needed. Currently the presence of frequency_start and/or frequency_end in Beam_group1 is used as a test and decision point for further processing steps; in some cases, the test is for nan values instead. For example: https://github.com/OSOceanAcoustics/echopype/blob/ed79cd82332f177e063d1b1341bb9d3f1bb5f734/echopype/calibrate/calibrate_ek.py#L312-L315

If the variables are set to always be present, these checks would need to be updated. If we assign nan values in cases where the variables are not applicable or used, it should be fairly straightforward to modified these checks. Assigning valid values in all cases (frequency_nominal instead of nan) seems more consistent, but the checks would then have to be done either on a different variable or based on transmit_frequency_start != transmit_frequency_stop.

One last observation: the convention defines these variables as float type, but echopype sets them as int type. Both are defined in Hz units.

leewujung commented 1 year ago

Mixed BB/CW setup

in one EK80 file I've been examining (ek80/ek80_bb_with_calibration/2018115-D20181213-T094600.raw), the frequency_start/end variables are assigned valid values only in 2 of the 6 channels, while the other 4 are assigned nan.

Yes, in this case 2 of the channels were set up to transmit BB signals and the other 4 set up to transmit CW signals.

Variables and checking for BB transmission

for EK60, AZFP and EK80 power data, one could set transmit_frequency_start = transmit_frequency_stop = frequency_nominal.

If that's correct, I recommend always creating these variables. That would be more consistent with the convention, since the variables are mandatory (their description is: "Frequency at the start/end of the transmit pulse."). If we do this, there'll be an additional change needed. Currently the presence of frequency_start and/or frequency_end in Beam_group1 is used as a test and decision point for further processing steps; in some cases, the test is for nan values instead.

Ah this reminds me that this in fact is related to #951, which is related to data format. Under pulse_form I had:

  • pulse_form: right now in the code we use the presence of frequency_start/end to check which ping/channel is transmitting narrowband or broadband signals; probably more convenient to use pulse_form, but that could cause backward compatibility issues.

We can go two ways, check for transmit_frequency_start == transmit_frequency_stop for the CW (True) and BB (False) cases. Or use pulse_form to do the same thing. My preference is to use the latter, since this will be a breaking change of data format and it would be good to get things in good order.

Data type

One last observation: the convention defines these variables as float type, but echopype sets them as int type. Both are defined in Hz units.

Oh interesting, I think maybe we could cast them to be float type. I remember Kavin was resolving something on AZFP related to this that ended up being just the type cast issue (and he had to set things to float for the computation to go through).

emiliom commented 1 year ago

Thanks!! What's the data type or content for pulse_form? Assuming it holds a set of known strings or simple int codes, it does sound like that'd be the clearest way to go.

It sounds like you're in agreement with creating transmit_frequency_start and transmit_frequency_stop in all cases and always populating them with valid, non-nan values (eg, frequency_nominal)?

leewujung commented 1 year ago

Thanks!! What's the data type or content for pulse_form? Assuming it holds a set of known strings or simple int codes, it does sound like that'd be the clearest way to go.

I think it is 1 and 0 -- it'll be good to address this along with #951.

It sounds like you're in agreement with creating transmit_frequency_start and transmit_frequency_stop in all cases and always populating them with valid, non-nan values (eg, frequency_nominal)?

Yes!

emiliom commented 1 year ago

I forgot to add one observation & question: It looks like in EK80 frequency_start/end are ping-invariant, from the examples I've seen and this example parameter datagram: https://github.com/OSOceanAcoustics/echopype/blob/ed79cd82332f177e063d1b1341bb9d3f1bb5f734/echopype/convert/parse_base.py#L197-L202 But the variables are created with a ping_time dimension -- which matches the convention (both v1 & v2).

Should we remove the ping_time dimension? It looks like we should.

leewujung commented 1 year ago

No we should NOT remove the ping_time dimension. This is something can change over time for EK80 broadband data.

emiliom commented 1 year ago

Ok, thanks. In that case, we should probably add ping_time to the variables in EK60 and AZFP, for consistency, right?

leewujung commented 1 year ago

Ugh... I don't know.

For AZFP I kept the ping_time dimension for transmit_duration_nominal and sample_interval mostly for the sake of consistency, and also for data from different deployment these may be different (these are configuration).

For transmit_frequency_start and transmit_frequency_stop, I feel they are a little like the property of the echosounder and are not configuration, so adding the ping_time dimension seems confusing.

emiliom commented 1 year ago

So, these 4 variables are all ping-invariant. transmit_duration_nominal and sample_interval just have an extra possibility of variation compared to transmit_frequency_start and transmit_frequency_stop because they can change with deployment.

Also, unlike transmit_duration_nominal and sample_interval, transmit_frequency_start and transmit_frequency_stop won't have any additional information not already present in frequency_nominal. They are there truly just for consistency.

It seems like the same could be said for CW channels in EK80. In those cases:

leewujung commented 1 year ago

Yeah, EK80 CW mode is identical with EK60 in this sense, since it just transmits narrowband signals.

transmit_frequency_start and transmit_frequency_stop are properties (or one could say limit) of the echosounder itself when it comes to AZFP and EK60. They were not designed to be variable.

emiliom commented 1 year ago

What do you recommend for EK80, then, for a Beam_group where all channels are CW?

  1. Omit the ping_time dimension. In that case, CW channels in one group may have a ping_time dim if there are also BB channels, while in another group they won't.
  2. Include the ping_time dimension and replicate values. In that case, CW channels will be consistent across groups, at the cost of potential confusion to users (plus data bloat).

Also, it sounds like you're recommend the same thing for EK60 as for AZFP (no ping_time), right?

leewujung commented 1 year ago

I actually think the least confusing would be to not have transmit_frequency_start and transmit_frequency_stop in the first place for AZFP, EK60, and EK80 CW data, and only have those present for EK80 BB data.

If we want to strive for having the variables because they are mandatory in the convention, then I would vote for option 1 to omit the ping_time dimension for AZFP, EK60, and EK80 CW data.

This is pretty messy, but I am not sure if I have better idea that do not make it more confusing.

emiliom commented 1 year ago

To follow up from our conversation yesterday: we will include transmit_frequency_start and transmit_frequency_stop for all instruments and populate them with values (= frequency_nominal for EK80 CW, EK60 and AZFP). The ping_time dimension will be omitted for EK60, AZFP and EK80 where all channels are CW.

leewujung commented 1 year ago

Yes, and just so that the logic is recorded here (and we should add that in the docs/ms too) is that the modification we are doing is to always have the mandatory (M) variables but the variable will only have the minimum dimensions required for each instrument/setting type (ie, the variable will always exist, but the dimensions may be different from the convention definition).

@emiliom I think you have a different/more precise way to say the second half, could you edit the above sentence?

emiliom commented 1 year ago

Thanks. I'll edit the sentence later. I'll close this issue once I've edited your text.

emiliom commented 1 year ago

Yes, and just so that the logic is recorded here (and we should add that in the docs/ms too) is that the modification we are doing is to always have the mandatory (M) variables but the variable will only have the minimum dimensions required for each instrument/setting type (ie, the variable will always exist, but the dimensions may be different from the convention definition).

@emiliom I think you have a different/more precise way to say the second half, could you edit the above sentence?

In echopype, our goal will be to always include mandatory (M) variables. However, for some instrument types, the dimensionality specified in the convention does not apply, or data may not be available at all. Our approach will be to reduce the dimensionality of variables to the minimum that is supported by the instrument type, thus reducing data duplication and bloat that can have a significant impact on data size and computational performance. As an example, the Sonar/Beam_groupX beam_direction_x/y/z variables are mandatory and originally defined in the convention as having dimensions of (ping_time, beam). But for the primary instruments currently supported by echopype (AZFP, EK60 and EK80), these variables don't vary along those dimensions. We will only assign them a channel dimension, which is the dimension we introduced to support our gridded Sonar/Beam_groupX data structure.