OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

Add some variables not currently stored from ping-by-ping data #951

Closed leewujung closed 1 year ago

leewujung commented 1 year ago

Currently there are some ping data variables not stored in the converted EK files. These are not used at the moment but could be useful (see below). This issue is a reminder to add them after we have the calibration code is in good shape.

For EK80 each ping contains the following variables:

# data in parser.ping_data_dict
['type', 'low_date', 'high_date', 'channel_id', 'data_type', 'spare', 'offset', 'count', 'timestamp', 'bytes_read', 'complex_dtype', 'n_complex', 'channel_mode', 'pulse_form', 'frequency', 'pulse_duration', 'sample_interval', 'transmit_power', 'slope', 'frequency_start', 'frequency_end']

We want to add:

['data_type', 'offset', 'count', 'channel_mode', 'pulse_form']

Note:

Already stored or captured in other forms:

['type', 'low_date', 'high_date', 'channel_id', 'timestamp', 'frequency', 'pulse_duration', 'sample_interval', 'transmit_power', 'slope', 'frequency_start', 'frequency_end']

Those we do not plan to store because they are already captured in other forms include:

['type', 'low_date', 'high_date', 'timestamp', 'bytes_read', 'complex_dtype', 'n_complex']

Not sure:

['spare']

Tagging @emiliom here for decisions on what to add and where. Note most of the already stored variables are in the Sonar/Beam_group1 group even though those are not defined in the convention. We need to decide if to change the current implementation, or continue this approach to save those new variables in the Sonar/Beam_group1 group also.

leewujung commented 1 year ago

Related to #1043.

emiliom commented 1 year ago

We want to add: ['data_type', 'offset', 'count', 'channel_mode', 'pulse_form']

Tagging @emiliom here for decisions on what to add and where. Note most of the already stored variables are in the Sonar/Beam_group1 group even though those are not defined in the convention. We need to decide if to change the current implementation, or continue this approach to save those new variables in the Sonar/Beam_group1 group also.

It seems like the primary rationale for adding these variables to the EchoData object is comprehensiveness, to retain all original, unique information in the converted dataset. Sticking all the variables in the Vendor_specific group would address that goal. A secondary rationale is convenience: that is, if the variables are likely to be used somewhat frequently, having them in Beam_groupX would help. In general, I'd advocate for only including a non-convention variable in Beam_groupX if there are strong convenience reasons.

leewujung commented 1 year ago

You could find information from here: Simrad EK80 reference manual These are under the Parameter XML datagram and RAW3/RAW4 datagram, and they can vary ping by ping.

It seems that data_type (power, angle, complex) is something that we've already captured information for, so we don't have to store that. count seems the same way. offset on the other hand is the first sample number related to range calculation, so maybe we keep it, perhaps as range_sample_offset or something similar.

Sample datagrams here: https://github.com/OSOceanAcoustics/echopype/blob/eb92185e83577b3e30bdcaf00e3e43ec0ada547a/echopype/convert/parse_base.py#L163-L229

It seems like the primary rationale for adding these variables to the EchoData object is comprehensiveness, to retain all original, unique information in the converted dataset. Sticking all the variables in the Vendor_specific group would address that goal. A secondary rationale is convenience: that is, if the variables are likely to be used somewhat frequently, having them in Beam_groupX would help. In general, I'd advocate for only including a non-convention variable in Beam_groupX if there are strong convenience reasons.

The parameters used for regular processing are pulse_form, offset, but if do not store data_type and count, then there's only one more channel_mode, so I wonder if we should just keep all 3 in the Beam_groupX group.

emiliom commented 1 year ago

You could find information from here: Simrad EK80 reference manual

Thanks. As far as I can tell, that manual doesn't describe the datagram parameters. I found those descriptions, plus related useful information, in another manual: Simrad EK80 interface specifications

and they can vary ping by ping

Ah, ok.

The parameters used for regular processing are pulse_form, offset, but if do not store data_type and count, then there's only one more channel_mode, so I wonder if we should just keep all 3 in the Beam_groupX group.

It seems ok to store them in Beam_groupX.

Here's what I've gleaned from the manual and your comments:

parameter name new variable name long_name units comments
pulse_form pulse_form Pulse type - An int code: 0 = CW, 1 = FM, 5 = FMD
offset range_sample_offset First sample number (a count)
channel_mode channel_mode Current transceiver mode - An int code: 0 = Active, 1 = (not specified in manual; Inactive?)
leewujung commented 1 year ago

Oh I must have pasted the wrong link, but great that you were able to find the right one. Yeah it's the one that contains datagram format.

What you suggested looks great!

emiliom commented 1 year ago

BTW, slope doesn't currently have a long_name. I looked through the manual and couldn't find a description. Any suggestions, while I'm at it?

leewujung commented 1 year ago

I think it is in the manual under the Parameters XML diagram. It is the tapering applied to a square pulse. How about something like "A factor associated with the hann window used in generating the transmit signal " -- this is super specific, but that is where this parameter is used in.

emiliom commented 1 year ago

Thanks. How about this more compact version: "Hann window slope parameter for transmit signal"

leewujung commented 1 year ago

Hann window slope parameter for transmit signal"

Yes, this is 100% better than what I suggested!

emiliom commented 1 year ago

PR #1094 fully addressed this issue (I hope!). We can close it now, right?

leewujung commented 1 year ago

Yes, we can close this now -- but give me a chance to check a few more files for that offset, and once I double confirm that we can close it.

emiliom commented 1 year ago

I just realized that EK60 Beam_group1 has count and offset variables, too (and data_type). Assuming they have the same meaning as in EK80, I suggest that for EK60 we rename offset and drop count (it's captured elsewhere?).

leewujung commented 1 year ago

Oh I completely forgot about that. Good catch! Yeah I think renaming offset and drop count would be the way to go. Count is the number of bytes to read the data from I think, or something similar.

emiliom commented 1 year ago

I think this issue has been fully addressed. @leewujung can you confirm that we can close the issue?

leewujung commented 1 year ago

Yes, I think we can close this now!