OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
89 stars 70 forks source link

Strange GPS coordinates appearing with add_location() method #1286

Closed oftfrfbf closed 2 months ago

oftfrfbf commented 3 months ago

Hello, I am trying to use echopype to match up Sv data with its associated geospatial information and I am getting strange GPS values when using "add_location()". I have noticed this happening with a number of different raw files, below I include one example.

This is using python 3.10.12 and echopype==0.8.3.

This first piece of code is my manual method for pairing up NMEA datagram latitude and longitudes with the nearest ping_time and this works as expected:

!pip install s3fs boto3==1.34.51 numpy==1.24.4 pandas==1.5.3 xarray==2022.12.0 geopandas==0.14.3 folium==0.16.0 typing_extensions==4.8.0 echopype==0.8.3 mapclassify
import s3fs
import folium
import os
import boto3
import numpy as np
import echopype as ep
import geopandas
import folium
import mapclassify as mc
import pandas as pd
import xarray as xr
from botocore import UNSIGNED
from botocore.config import Config
import matplotlib.pyplot as plt

s3 = boto3.client('s3', config=Config(signature_version=UNSIGNED))
s3_file_system = s3fs.S3FileSystem(anon=True)
print(f"echopype version: {ep.__version__}")

bucket_name = 'noaa-wcsd-pds'
ship_name = 'Albatross_Iv'
cruise_name = 'AL0403'
sensor_name = 'EK60'
file_name = "L0010-D20040416-T094042-EK60.raw"

raw_file_s3_path = f"s3://{bucket_name}/data/raw/{ship_name}/{cruise_name}/{sensor_name}/{file_name}"
echodata = ep.open_raw(raw_file_s3_path, sonar_model=sensor_name, use_swap=True, storage_options={'anon': True})
latitude = echodata.platform.latitude.values
latitude_values = latitude.copy()
latitude_values.sort()
print(latitude_values)
# [ 0.        0.       43.68361  ... 43.811595 43.811595 43.811595] <-- Note sorted latitude values

longitude = echodata.platform.longitude.values
nmea_times = np.sort(echodata.platform.time1.values)
time1 = echodata.environment.time1.values
indices = np.searchsorted(a=nmea_times, v=time1, side="right") - 1
lat = latitude[indices]
lat[indices < 0] = np.nan  # values recorded before indexing are set to nan
lon = longitude[indices]
lon[indices < 0] = np.nan

my_gps_df = pd.DataFrame({'latitude': lat, 'longitude': lon, 'time': time1}).set_index(['time'])
my_gps_gdf = geopandas.GeoDataFrame(my_gps_df, geometry=geopandas.points_from_xy(my_gps_df['longitude'], my_gps_df['latitude']), crs="epsg:4326")
np.nanmin(my_gps_gdf.latitude)  # should be '0.0'
my_gps_gdf[(np.abs(my_gps_gdf.latitude) < 1e-4) & (np.abs(my_gps_gdf.longitude) < 1e-4)] = np.nan  # remove null island values
np.nanmin(my_gps_gdf.latitude)  # should be '43.68361'

my_gps_gdf.index = my_gps_gdf.index.astype(str)
my_gps_gdf.explore()

Screenshot 2024-03-20 at 2 56 03 PM

And below here is code that uses the "add_location()" method which seems to introduce new values such as "19.572214" and "23.97964":

ds_sv = ep.calibrate.compute_Sv(echodata)
ds_sv_location = ep.consolidate.add_location(ds_sv, echodata)

latitude = ds_sv_location.latitude.values
longitude = ds_sv_location.longitude.values
ping_time = ds_sv_location.ping_time.values

gps_df = pd.DataFrame({'latitude': latitude, 'longitude': longitude, 'time': ping_time}).set_index(['time'])
gps_gdf = geopandas.GeoDataFrame(gps_df, geometry=geopandas.points_from_xy(gps_df['longitude'], gps_df['latitude']), crs="epsg:4326")
gps_gdf[(np.abs(gps_gdf.latitude) < 1e-4) & (np.abs(gps_gdf.longitude) < 1e-4)] = np.nan # remove null island values

latitude_values2 = list(gps_gdf.latitude.copy())
latitude_values2.sort()
print(latitude_values2[:5])
# [nan, 19.57221436379681, 23.979649686736696, 43.68361412270737, 43.68363237740446]

gps_gdf.index = gps_gdf.index.astype(str)
gps_gdf.explore()

The main cruise path is in the upper left, and the new points seem to be extrapolated between the feature and Null Island. Screenshot 2024-03-20 at 2 59 21 PM

If you have access to Google Colab, I have a notebook that should be shared here to use:

https://colab.research.google.com/drive/1TxzMMJ6hjPCo-ZWZsWm7ltd-dWDAFhMu?usp=sharing

I am wondering if this is a coding error on my part or if there is a problem with the interpolation. Looking here:

https://github.com/OSOceanAcoustics/echopype/blob/7f7e68bff98ced27644db88e1bf94600826c150f/echopype/consolidate/api.py#L186

Does it make sense to define 'method="nearest"' for the interpolation?

I have traditionally opted for numpy's searchsorted aligned from the "right" so that Sv data isn't paired with potentially stale GPS values in some edge cases where the data jumps position mid-file.

It would be nice to hear what others thoughts are on the subject. And thank you for any help!

leewujung commented 3 months ago

@oftfrfbf : Thanks for reporting this. @ctuguinay and I looked into this briefly and what we found was that this is not a bug in the code. This is due to the GPS data having values that are literally 0 before actual values show up, so the interpolated values are between 0 and 43.xxx.

We think that this is something that we should not try to "fix" automatically in the Echopype codebase, because there are many ways the GPS data can go wrong with interpolation (like the recent #1294). Instead, we will add a warning about GPS values having 0s, since that is unlikely the case for actual measurements. Users can then decide how they want to clean up the GPS data before calling add_location.

Let us know if you have additional thoughts on this.

ctuguinay commented 2 months ago

Closed by #1296