Closed b-reyes closed 1 year ago
From my understanding, this is purely caused by how we handle the parsed data
I think it is probably fair to say that it is determined by what the order is in the first set of datagrams for all channels. This order can change depending on which one is sent by the instrument first.
For the new combine routine, it is imperative that all Datasets being combined have the channel coordinate in the same order.
I think it is possible to short circuit this by figuring out the channel sequence when you have the individual datasets/groups, and just write that in the "right" order, since you can control which part of the final zarr array you are filling in and the coordinates are loaded for the individual datasets/groups whether lazy-loaded or not.
That said, the sorting shouldn't take too much time, and it is a good thing to have.
Note: I don't think we can order the class object
self.parser_obj.ch_ids
because of how we utilize it in various situations (see example above).
For the example you referenced or the other cases when you only save the broadband or narrowband channels into different groups, I think you can sort the final set of ch_ids
under each group and assemble the dataset according to that ~, and use xr.Dataset.transpose
to sort the already assembled dataset under that group (which is according to the channel sequence of that particular .raw file) into the correct order~ before saving.
As you said, we should not change whatever the order in self.parser_obj
-- that will likely break many things!
I think it is possible to short circuit this by figuring out the channel sequence when you have the individual datasets/groups, and just write that in the "right" order, since you can control which part of the final zarr array you are filling in and the coordinates are loaded for the individual datasets/groups whether lazy-loaded or not.
Yes, I agree that it is not absolutely necessary and we could do what you are suggesting here. The downside is this may not be efficient because it could introduce more IO. I suggest we avoid this situation unless a user specifically asks for this type of functionality to be added (I could see something like this being useful).
That said, the sorting shouldn't take too much time, and it is a good thing to have.
I agree, hopefully it will not take too much time and it will improve the consistency of the EchoData
creation. I will go ahead and work on this.
This has been addressed in #818.
Recently I attempted to combine narrowband EK80 files from noaa-wcsd-pds. While doing this I found that the files had the same channels, but the ordering of these channels were different. For example, the
channel
coordinate had the following values for two different raw files:From my understanding, this is purely caused by how we handle the parsed data (@emiliom or @leewujung can you confirm this?). In
set_groups
, we create datasets by looping over the channels. However, we never sort these channels. This can be easily seen in EK80 when setting thePlatform
group:https://github.com/OSOceanAcoustics/echopype/blob/361d44ee319dbe25b610759cc8c6e42d4f469135/echopype/convert/set_groups_ek80.py#L218-L228
For the new combine routine, it is imperative that all Datasets being combined have the
channel
coordinate in the same order.To rectify this situation, I propose that we review
set_groups
andparsed_to_zarr
and make sure that channels are always sorted in ascending order.Note: I don't think we can order the class object
self.parser_obj.ch_ids
because of how we utilize it in various situations (see example above).As I need this to be fixed to continue with the combine routine, I will go ahead and work on this.