Closed b-reyes closed 2 years ago
I think we decided that since these are not defined in the convention, we will keep them in a minimum form, i.e., only has dimension channel
. Also, just for completeness, these parameters are due to the physical transducer in this case, so won't change over ping. These are also counting all the sectors of a split-beam transducer to be 1 beam.
I think we decided that since these are not defined in the convention, we will keep them in a minimum form, i.e., only has dimension
channel
. Also, just for completeness, these parameters are due to the physical transducer in this case, so won't change over ping. These are also counting all the sectors of a split-beam transducer to be 1 beam.
Although I agree that we have freedom here because they are not defined in the convention, I was trying to bring consistency with the current form of EK80. EK80 also has these variables and in previous issues we decided to add beam
and ping_time
to them (perhaps to mirror beamwidth_receive/transmit_alongship/athwartship
). So, I think we should either add these dimensions for EK60 or remove them in EK80 for the variables beamwidth_twoway_alongship/athwartship
.
In my PR to address this I made both EK60 and EK80 the same by having variables beamwidth_twoway_alongship/athwartship
. I did not pay attention to the dimensions at that time, but since the transducers they use are the same in general once power/angle data are computed the split-beam, what we have in both should be consistent.
EK80 also has these variables and in previous issues we decided to add
beam
andping_time
to them (perhaps to mirrorbeamwidth_receive/transmit_alongship/athwartship
.
I actually think we decided to keep *_alongship/*_athwartship
to not have extra dimensions, because of the excessive NaN padding that would have to happen to add ping_time
. Maybe @emiliom could chime in here to decide as the convention police. :P
I actually think we decided to keep _alongship/_athwartship to not have extra dimensions, because of the excessive NaN padding that would have to happen to add ping_time. Maybe @emiliom could chime in here to decide as the convention police. :P
@leewujung could you be more specific here? In PR #638 we added extra dimensions to several variables that had alongship/athwartship. Maybe you are referring to the decision that we should not include beamwidth_receive/transmit_alongship/athwartship
in EK80 because this would result in excessive NaN padding for variables that would be unused.
I've gone through a chain of issues and PR's. In summary: I think we should add beam
and ping_time
to the EK60 beamwidth_twoway_alongship/athwartship
, as @b-reyes suggests. This will be done very easily, and @b-reyes has implemented it in his draft PR #680.
A few observations:
beamwidth_receive/transmit_major/minor
with dimensions of beam
and ping_time
beam
and ping_time
beamwidth_receive/transmit_alongship/athwartship
were not following the convention strictly speaking, b/c they didn't have the same name. beam
and ping_time
apparently results in broadcasting (repetition) of values rather than padding with NaN's (though that's based on just a limited assessment on my part). The same would be true for EK60.beam
and ping_time
dimensions. @b-reyes nicely laid out the consensus, as it congealed, for EK60 first and then for EK80. While for EK60 the consensus at the time referred to the beamwidth_receive/transmit_alongship/athwartship
variables, the change to beamwidth_twoway_alongship/athwartship
in PR #638 was a simple renaming to improved the accuracy of the names.Great, thanks @emiliom for your investigations and conclusion!
Throwing #668 in the pile of refs for the naming details/value used re: one-way (convention) vs two-way (echopype) beamwidth and naming details between beamwidth_receive/transmit_major/minor
(convention) vs beamwidth_receive/transmit_alongship/athwartship
(echopype).
@emiliom thank you for looking into this! I will go ahead and open up PR #680. From my understanding, it appears that we all are voting for the addition of beam
and ping_time
to the EK60 beamwidth_twoway_alongship/athwartship
variables. @leewujung can you confirm this?
@b-reyes : Yes, let's do this for v0.6.0. Because of the repeated variables, for .nc files we are likely to have a large increase of data volume in disk, and for .zarr we are probably ok with the compression. Could you please do a comparison and report in PR #680? Granted there are other variables that these are added, but let's see what are the impact for this two for our own education.
If the .zarr files are indeed much smaller, it seems would be a good idea for us to explicitly recommend using .zarr as the converted data format.
I am a little worried about the in-memory form of this expansion. Something we definitely will have to figure out as part of dealing with the memory problem in data conversion and combination.
I like @leewujung 's suggestion to do that sort of memory comparison. One easy, clean comparison might be to use the same raw file(s) to generate a converted file using 0.5.6 vs 0.6.0dev.
As for file size with .nc vs zarr, don't both of them support the same types of compression? Assuming I'm not off, I think the key differences vis a vis memory have to do with zarr's chunked I/O, not file size per se.
I am a little worried about the in-memory form of this expansion. Something we definitely will have to figure out as part of dealing with the memory problem in data conversion and combination.
Absolutely. One of our next priorities, IMHO.
@leewujung and @emiliom I have added a comment to PR #680 that does a memory comparison.
For the record, I will echo @leewujung's worry of memory expansion. This issue and other issues that we have implemented (to adhere to the convention) will likely cause memory issues down the road. After v0.6.0 I think we need to seriously consider if our strict compliance to the convention is in the best interest of echopype's future. I think the convention is a great guideline, but because it encompasses several different sonar sensors, it can make the storage of data from sensors (such as EK60) less than ideal.
After v0.6.0 I think we need to seriously consider if our strict compliance to the convention is in the best interest of echopype's future. I think the convention is a great guideline, but because it encompasses several different sonar sensors, it can make the storage of data from sensors (such as EK60) less than ideal.
I fully agree with @b-reyes on this -- I was somewhat questioning this choice we made for v0.6.0 upgrade for compliance with the convention. I think optimizing compression settings would be a good way for storage of converted data, but on the point of computation, I wonder if there are ways to implement some efficient "check and condense" type of operation so that we squeeze redundant values out every time duplicated values are encountered along a particular dimension, the most likely the ping_time
dimension. In theory, distributed computing would partially offset this impact of duplicated values, but it could also be a waste of memory even if it is reasonable fast. And we always need to make computation possible on local personal machines.
With #680 merged it appears that we can close this.
In PR #668 we removed the variables
beamwidth_receive_alongship/athwartship
andbeamwidth_transmit_alongship/athwartship
, and replaced them with the variablesbeamwidth_twoway_alongship/athwartship
for the EK60 sensor. In PR #638 we added the dimensionsbeam
andping_time
to these removed variables. In its current state, the dev branch variablesbeamwidth_twoway_alongship/athwartship
only have dimensionchannel
.@leewujung am I correct in assuming that we need to add the dimensions
beam
andping_time
to these variables?