OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Correct dimensions in EK60 `beam_group1` variables #679

Closed b-reyes closed 2 years ago

b-reyes commented 2 years ago

In PR #668 we removed the variables beamwidth_receive_alongship/athwartship and beamwidth_transmit_alongship/athwartship, and replaced them with the variables beamwidth_twoway_alongship/athwartship for the EK60 sensor. In PR #638 we added the dimensions beam and ping_time to these removed variables. In its current state, the dev branch variables beamwidth_twoway_alongship/athwartship only have dimension channel.

@leewujung am I correct in assuming that we need to add the dimensions beam and ping_time to these variables?

leewujung commented 2 years ago

I think we decided that since these are not defined in the convention, we will keep them in a minimum form, i.e., only has dimension channel. Also, just for completeness, these parameters are due to the physical transducer in this case, so won't change over ping. These are also counting all the sectors of a split-beam transducer to be 1 beam.

b-reyes commented 2 years ago

I think we decided that since these are not defined in the convention, we will keep them in a minimum form, i.e., only has dimension channel. Also, just for completeness, these parameters are due to the physical transducer in this case, so won't change over ping. These are also counting all the sectors of a split-beam transducer to be 1 beam.

Although I agree that we have freedom here because they are not defined in the convention, I was trying to bring consistency with the current form of EK80. EK80 also has these variables and in previous issues we decided to add beam and ping_time to them (perhaps to mirror beamwidth_receive/transmit_alongship/athwartship). So, I think we should either add these dimensions for EK60 or remove them in EK80 for the variables beamwidth_twoway_alongship/athwartship.

leewujung commented 2 years ago

In my PR to address this I made both EK60 and EK80 the same by having variables beamwidth_twoway_alongship/athwartship. I did not pay attention to the dimensions at that time, but since the transducers they use are the same in general once power/angle data are computed the split-beam, what we have in both should be consistent.

EK80 also has these variables and in previous issues we decided to add beam and ping_time to them (perhaps to mirror beamwidth_receive/transmit_alongship/athwartship.

I actually think we decided to keep *_alongship/*_athwartship to not have extra dimensions, because of the excessive NaN padding that would have to happen to add ping_time. Maybe @emiliom could chime in here to decide as the convention police. :P

b-reyes commented 2 years ago

I actually think we decided to keep _alongship/_athwartship to not have extra dimensions, because of the excessive NaN padding that would have to happen to add ping_time. Maybe @emiliom could chime in here to decide as the convention police. :P

@leewujung could you be more specific here? In PR #638 we added extra dimensions to several variables that had alongship/athwartship. Maybe you are referring to the decision that we should not include beamwidth_receive/transmit_alongship/athwartship in EK80 because this would result in excessive NaN padding for variables that would be unused.

emiliom commented 2 years ago

I've gone through a chain of issues and PR's. In summary: I think we should add beam and ping_time to the EK60 beamwidth_twoway_alongship/athwartship, as @b-reyes suggests. This will be done very easily, and @b-reyes has implemented it in his draft PR #680.

A few observations:

520 is where we arrived at final decisions regarding the addition of beam and ping_time dimensions. @b-reyes nicely laid out the consensus, as it congealed, for EK60 first and then for EK80. While for EK60 the consensus at the time referred to the beamwidth_receive/transmit_alongship/athwartship variables, the change to beamwidth_twoway_alongship/athwartship in PR #638 was a simple renaming to improved the accuracy of the names.

leewujung commented 2 years ago

Great, thanks @emiliom for your investigations and conclusion!

Throwing #668 in the pile of refs for the naming details/value used re: one-way (convention) vs two-way (echopype) beamwidth and naming details between beamwidth_receive/transmit_major/minor (convention) vs beamwidth_receive/transmit_alongship/athwartship (echopype).

b-reyes commented 2 years ago

@emiliom thank you for looking into this! I will go ahead and open up PR #680. From my understanding, it appears that we all are voting for the addition of beam and ping_time to the EK60 beamwidth_twoway_alongship/athwartship variables. @leewujung can you confirm this?

leewujung commented 2 years ago

@b-reyes : Yes, let's do this for v0.6.0. Because of the repeated variables, for .nc files we are likely to have a large increase of data volume in disk, and for .zarr we are probably ok with the compression. Could you please do a comparison and report in PR #680? Granted there are other variables that these are added, but let's see what are the impact for this two for our own education.

If the .zarr files are indeed much smaller, it seems would be a good idea for us to explicitly recommend using .zarr as the converted data format.

I am a little worried about the in-memory form of this expansion. Something we definitely will have to figure out as part of dealing with the memory problem in data conversion and combination.

emiliom commented 2 years ago

I like @leewujung 's suggestion to do that sort of memory comparison. One easy, clean comparison might be to use the same raw file(s) to generate a converted file using 0.5.6 vs 0.6.0dev.

As for file size with .nc vs zarr, don't both of them support the same types of compression? Assuming I'm not off, I think the key differences vis a vis memory have to do with zarr's chunked I/O, not file size per se.

I am a little worried about the in-memory form of this expansion. Something we definitely will have to figure out as part of dealing with the memory problem in data conversion and combination.

Absolutely. One of our next priorities, IMHO.

b-reyes commented 2 years ago

@leewujung and @emiliom I have added a comment to PR #680 that does a memory comparison.

For the record, I will echo @leewujung's worry of memory expansion. This issue and other issues that we have implemented (to adhere to the convention) will likely cause memory issues down the road. After v0.6.0 I think we need to seriously consider if our strict compliance to the convention is in the best interest of echopype's future. I think the convention is a great guideline, but because it encompasses several different sonar sensors, it can make the storage of data from sensors (such as EK60) less than ideal.

leewujung commented 2 years ago

After v0.6.0 I think we need to seriously consider if our strict compliance to the convention is in the best interest of echopype's future. I think the convention is a great guideline, but because it encompasses several different sonar sensors, it can make the storage of data from sensors (such as EK60) less than ideal.

I fully agree with @b-reyes on this -- I was somewhat questioning this choice we made for v0.6.0 upgrade for compliance with the convention. I think optimizing compression settings would be a good way for storage of converted data, but on the point of computation, I wonder if there are ways to implement some efficient "check and condense" type of operation so that we squeeze redundant values out every time duplicated values are encountered along a particular dimension, the most likely the ping_time dimension. In theory, distributed computing would partially offset this impact of duplicated values, but it could also be a waste of memory even if it is reasonable fast. And we always need to make computation possible on local personal machines.

leewujung commented 2 years ago

With #680 merged it appears that we can close this.