OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

Enhance update_platform to support more use cases and produce more consistent results #740

Closed emiliom closed 1 year ago

emiliom commented 2 years ago

The use case that drove the development and testing of update_platform was a single set of Saildrone data files from the Pacific Hake survey, with one EK80 raw file with no Platform data at all and an associated external data file.

I illustrate and document some of the issues in this notebook.

General comments about the current implementation of update_platform

Some open questions about the fixed-location use case, such as from a mooring

Issues unrelated to update_platform per se

Comparing the outcome of update_platform for AZFP and EK60 raw files that don't have the variables handled by update_platform but do have other Platform variables exposed some remaining differences in how variables with empty values are being handled by the different set_groups_* classes. These differences are illustrated in my sample notebook, and they involve issues such as:

Also, set_groups_ek60 does not add the 3 empty global platform_* attributes, whereas set_groups_ek80 and set_groups_azfp do.

Finally, set_groups_azfp is not assigning any attributes to the AZFP-specific variables tilt_x and tilt_y .

emiliom commented 2 years ago

See PR #741, which addressed a few of the tasks from this issue. In particular, the simple fixed-location use case where a single lat-lon coordinate is passed to update_platform. The choice of which timestamp to associate with it was left to the user.

leewujung commented 2 years ago

Some open questions about the fixed-location use case, such as from a mooring

  • What should the timestamp on the location be? eg, the first or last ping_time value? When using a single timestamp (time1 will have a length of 1), the assumption is that it applies to the entire dataset (all ping times).
  • Would it be clearer to replicate the fixed location across more than one timestamp? For example, the first and last ping_time?

I think it will be clearer to use just the first ping_time for this. I am thinking of the case when we combine multiple files, say 10, seeing 10 identical lat/lon with the time corresponding to the first ping_time of each file seems more intuitive. We can perhaps add in the comment that the lat/lon are added through update_platform and use the first ping_time as the timestamp.

emiliom commented 2 years ago

I think it will be clearer to use just the first ping_time for this. I am thinking of the case when we combine multiple files, say 10, seeing 10 identical lat/lon with the time corresponding to the first ping_time of each file seems more intuitive.

I hadn't thought of that case, and it's a good point. Right now this decision (whether to use a single point with the first ping_time or two points) is external to update_platform, but I'm thinking that we could build in -- now or down the road -- a special handling of the fixed-location case such that the user could pass a single lat,lon coordinate; then update_platform would implement the assumption (eg, single point timestamped to the first ping_time).

We can perhaps add in the comment that the lat/lon are added through update_platform and use the first ping_time as the timestamp.

In this issue I've included a suggestion to add the history attribute to every variable that's updated (inserted/rewritten) by update_platform. That would take care of the first part of your suggestion. The second part, the use of the first ping_time as the timestamp, could be added there too (for simplicity); or as a comment attribute.

Overall, I like your suggestions and reasoning.

emiliom commented 2 years ago

For version 0.6.1 (short time frame), I think we can focus largely on the fixed-location use case and the issues and clean ups I've already addressed in PR #741. We can leave the other comments I've raised here to 0.6.2.

leewujung commented 1 year ago

Just wanted to note that this is related to #769.

leewujung commented 1 year ago

Now #1009 supersede #769.

emiliom commented 1 year ago

I've gone over this issue, related PR's, and the outcome of update_platform in an example. I think everything here has been addressed. The only exception is the issue now tracked in #1009; we'll discuss and track its progress there and can close this issue.

There is one usability improvement mentioned here that is not implemented:

I think it will be clearer to use just the first ping_time for this. I am thinking of the case when we combine multiple files, say 10, seeing 10 identical lat/lon with the time corresponding to the first ping_time of each file seems more intuitive.

I hadn't thought of that case, and it's a good point. Right now this decision (whether to use a single point with the first ping_time or two points) is external to update_platform, but I'm thinking that we could build in -- now or down the road -- a special handling of the fixed-location case such that the user could pass a single lat,lon coordinate; then update_platform would implement the assumption (eg, single point timestamped to the first ping_time).

This would be nice, but let's create a new issue solely for it. We can then address it in a later release.

emiliom commented 1 year ago

Now that I've opened #1126, we close this issue.