Using vessel log data in downstream dataset

leewujung commented 3 months ago

With #1318 , the vessel log data would be stored in the Vendor_specific group. However right now these data is not propagate down to the Sv and MVBS datasets if we need to use it.

In theory one can just add the vessel log distance/lat/lon in when needed, directly from the EchoData object, but that does require very good provenance tracking so one can recover the EchoData object used for calibration.

I wonder if it would make sense to add an optional argument include_idx_data=True/False in compute_Sv so that the these variables can be added to the Sv datasets at the last step before exiting.

The same thing can be done with MVBS, but for that perhaps we just propagate whatever is in the Sv dataset by default, except for the variables that are bin-averaged.

@ctuguinay : thoughts?

ctuguinay commented 3 months ago

At least for the compute_Sv side of things, I think this should also be its separate function just like how ep.consolidate.add_depth and ep.consolidate.add_location is its own. There's this messiness with the interpolation when the time3 dim of the IDX variables doesn't have a 1-1 matching with the ping_time of the Sv dataset.

ctuguinay commented 3 months ago

And once it's in Sv, are you thinking of propagating it downwards to MVBS by also bin averaging it (to match the new ping time and depth/echo range bins of the MVBS)?

leewujung commented 3 months ago

Oh I see what you mean. I was actually thinking to not interpolate, and simply plug in the entire variables, because the lat/lon would be duplicated from what ep.consolidate.add_location would add already. And it is the same GPS.

Actually, maybe all we need is the distance, timestamp, and file_offset? I guess we should read the manual on that datagram association to see how the timing is associated along with the file_offset parameter...

For MVBS, lat/lon are already bin-average if they are present in the Sv dataset, so again perhaps all we need is the distance, timestamp, and file_offset?

ctuguinay commented 3 months ago

To retain the same dimensions in ds_Sv, I still think the vessel_distance would need to be interpolated upon if it is to match time-wise with ping_time, but I think ep.consolidate.add_location already does this interpolation when it doesn't match. All we would need to do is add a line to the interpolations already done:

interp_ds["latitude"] = sel_interp("latitude", time_dim_name)
interp_ds["longitude"] = sel_interp("longitude", time_dim_name)
interp_ds["vessel_distance"] = sel_interp("vessel_distance", time_dim_name) # new line

In the case where no interpolation is done and vessel_distance (and timestamp and file_offset) are added, this would add a 4th dimension time3 to the ds_Sv.

Although, since the variables themselves don't need to be stored with ping_time, range_sample, and channel, perhaps that is fine? I don't expect the entire dataset to expand that much without interpolation if adding them introduces a new dimension, since it won't affect the big data variables like Sv and echo_range and depth.

leewujung commented 3 months ago

Ah I guess I didn't explain what I thought the next step would be. Since the way we use vessel log is to use the distance as markers, I was thinking that we can find the closest distance and use that corresponding timestamp to slice the Sv or MVBS data based on ping_time. This is why I think we should look into what the manual is saying about the file_offset, since that is related to this association. Since Simrad is very specific about this association, I think we should understand that and then decide if slicing vs interpolation would be a better solution.

ctuguinay commented 3 months ago

Ah, I see now. I'll get to reading 🫡

leewujung commented 3 months ago

me too!

OSOceanAcoustics / echopype

Using vessel log data in downstream dataset #1344