Closed b-reyes closed 1 year ago
If we can't decide on which option is best, we could have a user input to
combine
that allows the user to choose which of the options above is the most appropriate for their situation.
I am thinking that, in this release (v0.6.3), we can default to raise an error if the EchoData
or files to be combined have different number of channels, or if the number of channels is the same but the channel_id
do not match exactly.
In a later release, I agree with your suggestion on letting people select which subset of channels they would like to combine or allow to expand. Using the example you have above, if the full set of channels from all files are a, b, c, d
, and some files only have a, b, c
and some files have a, b, c, d
.
a, b, c
, so the resulting data set will have only channels a, b, c
a, b, c, d
Allowing these would be pretty nice, especially the subsetting part. We should caution against using the expansion approach since that would risk creating large dataset.
In Nov 18 meeting:
Based on a discussion between @leewujung and myself, we will first complete the following changes that partially address this issue:
channel_selection
to combine_echodata
that has a datatype of None
or a List
channel_selection = None
and have a RuntimeError if all of the channels are not the same amongst the files being combined.
channel_selection
is a list, it will specify what channels the user wants in the final combined EchoData
object
The below are only for the current implementation that ONLY allows for user choosing a subset of the channels. Expansion is not allowed.
Sonar
that is called Sonar/Beam_group1
channel
dimension{
"Sonar/Beam_group1": ["a", "b", "c"],
"Sonar/Beam_group2": ["c", "d", "e"],
}
or only 1 list, which will be interpreted as both keys having the same values
channel
dimension does not existchannel
dimension does not existchannel
: union of the 2 lists if there are 2cal_channel_id
: maintain current behavior: this is treated as constant and the dataset with the largest length will be used, so nothing will be done@b-reyes and I decided that we'll close this issue and create a new one for the "expansion" type of combine operation.
In PR #808 it was found that some files from
noaa-wcsd-pds/data/raw/Bell_M._Shimada/SH1701/EK60
can have different sizedchannel
dimensions. In PR #808 we will not allow these types of files to be combined, but we do see that there is value in allowing for these types of files to be combined. Thus, we need to discuss the possibility of allowing missing channels amongst the datasets being combined.From discussion, there appears to be two ways to resolve this:
['a', 'b', 'c']
and another with['a', 'c']
, then the combined Dataset would have the channels['a', 'c']
.If we can't decide on which option is best, we could have a user input to
combine
that allows the user to choose which of the options above is the most appropriate for their situation.