OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
98 stars 73 forks source link

Add more comprehensive tests to `consolidate.add_latlon` #769

Closed leewujung closed 1 year ago

leewujung commented 2 years ago

This is a leftover to #749 , in which we added a new function to interpolate lat-lon locations from the EchoData["Platform"] group to calibrated Sv data. That PR includes a basic test to run add_latlon for an EK60 file and check for existence of the latitude and longitude variables.

We still need tests to cover the following cases:

leewujung commented 1 year ago

This will ensure #730 runs through without problem.

emiliom commented 1 year ago

@leewujung in add_location, the time1 coordinate variable is dropped at the very end:

https://github.com/OSOceanAcoustics/echopype/blob/f8082cac32bfbee2b21c62d266507491d221182d/echopype/consolidate/api.py#L188

I've looked closely at a couple of sample datasets. For two EK60 datasets, after compute_Sv, there are no variables with a time1 dimension. The only other time dimension in use besides ping_time is time3, used with water_level. But for an AZFP dataset, after compute_Sv, water_level uses time1, not time3. Off the top of my head I don't remember which time dimension water_level is supposed to have or whether it'd be the same regardless of instrument. If water_level having a time1 dimension is in fact valid, then it shouldn't be dropped for AZFP.

I also investigated the result of interpolation. For some reason that I haven't investigated, the first interpolated value appears to always be nan (in two EK60 example files). For now, I'm going to modify add_location and set the first value of interpolated latitude and longitude equal to the second value. We can discuss here or in the PR that I'll submit shortly.

leewujung commented 1 year ago

For I looked at the docs and couldn't completely remember what's up with our various time* dimensions, but I think within a single group the time* changes depending on the source of data (since they can be on different time base), which varies depending on the sonar_model.

I found this issue https://github.com/OSOceanAcoustics/echopype/issues/656 that documents what those time* are. I think we should replicate those in the docs.

You can see that water_depth is called out here to be aligned with time3.

AZFP should not have a water_depth variable stored by default, see code below -- which data file were you looking at? https://github.com/OSOceanAcoustics/echopype/blob/f8082cac32bfbee2b21c62d266507491d221182d/echopype/convert/set_groups_azfp.py#L145-L168

emiliom commented 1 year ago

I looked at the AZFP file that I've used in test_consolidate_integration.py: "azfp/17082117.01A" with XML "azfp/17041823.XML". This file is used in other tests, too. A water_level (not water_depth) variable is present but it contains all nan's. Looking more closely, it appears that it's an outcome of update_platform (with external data), so it's possible that the presence of a time1 dimension in the water_level variable (which is empty, anyways) may be an artifact or bug in echodata.update_platform. Worth following up later in a separate issue, but now it's looking less central to how consolidate.add_latlon behaves.

emiliom commented 1 year ago

This is outside the scope of this issue, but how about we add an option to calibrate.compute_Sv that would run consolidate.add_latlon if Platform lat-lon data are present in the echodata object? That could even be the default behavior that could be turned off with an argument like add_latlon=False.

I can open a new issue if this idea has legs. I don't know if it's something we'd have the bandwidth to fit into 0.7.0.

leewujung commented 1 year ago

Hmm.. I think that may be meddling too much with the workflow side of things -- but, interestingly, I was pondering about the scenario for adding split-beam angle for broadband data. There, one of the computing step (and in fact an expensive one to do pulse compression) is the same, so to calibrate first and then to compute the split-beam angle later is doing the works twice. There I could see adding an option flag to do both together. In the case of add_latlon, since the computation is completely separate, I suggest that we keep this option on the workflow side.

emiliom commented 1 year ago

In the case of add_latlon, since the computation is completely separate, I suggest that we keep this option on the workflow side.

Sounds good, at least at this stage. I'll close this issue now, since its goals were met in PR #1000

emiliom commented 1 year ago

I've created a new issue (#1003) to follow up on the issue of the time* dimension in the water_level variable.

emiliom commented 1 year ago

I'm reopening this issue, temporarily ...

In PR #1000 I added several new consolidate.add_latlon tests to test_consolidate_integration.py, including two new test datasets (including one AZFP). One thing I did not add was an EK80 test file. While running consolidate.add_latlon on two EK80 test files just now, I ran into the same error during the interpolation step (return position_var.interp(time1=ds["ping_time"])):

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

The two files are ek80_bb_with_calibration/2018115-D20181213-T094600.raw and ek80/Summer2018--D20180905-T033113.raw. Examining the data a bit, I found two factors in common among those two files:

The error is actually emitted by Pandas. In searching for this error message, I see some references to problems with datetimes. I wasted a lot of time thinking the problem might be with the interpolation of near-constant values, but I now think it's somehow related to the tiny datetime range.

I'm going to set this aside as an edge case that'll take too much more effort to track down for 0.7.0. I'll open a new issue later.

emiliom commented 1 year ago

I'll close this again, now that I've opened #1009