OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
96 stars 73 forks source link

Add `echodata.align_to_ping_time()`? #1354

Open leewujung opened 2 months ago

leewujung commented 2 months ago

@ctuguinay had the idea of adding echodata.align_to_ping_time() to handle the timestamp alignment that we now have in the code for in:

  1. calibration (align timestamps of Environment group variables to ping_time)
  2. add depth (align timestamps of Platform group variables to ping_time)

I think this is a really interesting idea and am just recording my thoughts below. Essentially right now in the consolidate subpackage we have functions to add ancillary data into the Sv dataset once it is calibrated, and in Echolevels we call these Level 2 datasets. If we have echodata.align_to_ping_time() then the EchoData object may actually qualify as Level 2 in addition to the Sv.

One good thing about this is that we can have all the messy timestamp handling in one place, instead of interspersed in the code in various places when it's needed:

jmjech commented 1 month ago

Hi, I will add to this thread as I get an error in align.py -> align_to_ping_time. Prior to updating Echopype version, I ran into an error at the part of the workflow where the time of the environmental parameters is synchronized with the data time - in harmonize_env_param_time (the version I was using did not have "align_to_ping_time" as a separate function). This only seems to happen with EK80 type files. I did not get this error with EK60 type files.

I installed the latest version of Echopype a couple of days ago and converted the raw EK80 data files to netCDF4 format, and now get a new error (shown below) but the situation is similar. When I run the code using only one of the files, the Echopype computes Sv without issue. It is when I have two files that the error occurs. I haven't tried with three or more, but I'm assuming the error would be consistent with two or more files.

This is the link to the .raw data files: https://drive.google.com/drive/folders/1B5c_7Oi-cqkotauj9DcaU1yh_jDjm7Yy. Each file is >200MB large

Here is my code:

import xarray as xr
import numpy as np
import pandas as pd
from pathlib import Path

datapath = Path('/media/mjech/Fortress_L3/EK80_Data/BIWF/netCDF4_Files')
filenames = ['20231017_BIWF_WTG-1_Spiral-D20231017-T225313.nc',
                       '20231017_BIWF_WTG-1_Spiral-D20231017-T230522.nc']
wf = 'BB'
encode = 'complex'
sonar_model = 'EK80'
edlist = []
for f in filenames:
edlist.append(ep.open_converted(str(Path(datapath / Path(f)))))
ed = ep.combine_echodata(edlist)
Sv = ep.calibrate.compute_Sv(ed, waveform_mode=wf, encode_mode=encode)

This is the error - it is a very long error:

ValueError Traceback (most recent call last) ~/NOAA_Gdrive/sonarpros/Python_Programs/EK_ES/describe_echodata.py in 49 ed = ep.combine_echodata(edlist) 50 ---> 51 Sv = ep.calibrate.compute_Sv(ed, waveform_mode=wf, encode_mode=encode) 52 53 '''

~/.local/lib/python3.10/site-packages/echopype/calibrate/api.py in compute_Sv(echodata, kwargs) 207 it must be set using EchoData.update_platform(). 208 """ --> 209 return _compute_cal(cal_type="Sv", echodata=echodata, kwargs) 210 211

~/.local/lib/python3.10/site-packages/echopype/calibrate/api.py in _compute_cal(cal_type, echodata, env_params, cal_params, ecs_file, waveform_mode, encode_mode) 51 52 # Set up calibration object ---> 53 cal_obj = CALIBRATOR[echodata.sonar_model]( 54 echodata, 55 env_params=env_params,

~/.local/lib/python3.10/site-packages/echopype/calibrate/calibrate_ek.py in init(self, echodata, env_params, cal_params, waveform_mode, encode_mode, ecs_file, **kwargs) 294 # Get env_params: depends on waveform mode 295 #breakpoint() --> 296 self.env_params = get_env_params_EK( 297 sonar_type=self.sonar_type, 298 beam=beam,

~/.local/lib/python3.10/site-packages/echopype/calibrate/env_params.py in get_env_params_EK(sonar_type, beam, env, user_dict, freq) 357 for p in out_dict.keys(): 358 print(' out_dict key p: {0}'.format(p)) --> 359 out_dict[p] = harmonize_env_param_time(out_dict[p], ping_time=beam["ping_time"]) 360 #breakpoint() 361 print(' return out_dict: {0}'.format(out_dict))

~/.local/lib/python3.10/site-packages/echopype/calibrate/env_params.py in harmonize_env_param_time(p, ping_time) 69 70 # Align array to ping time ---> 71 return align_to_ping_time( 72 p.dropna(dim="time1", how="all"), "time1", ping_time, method="linear" 73 )

~/.local/lib/python3.10/site-packages/echopype/utils/align.py in align_to_ping_time(external_da, external_time_name, ping_time_da, method) 57 ) 58 else: ---> 59 return external_da.interp( 60 {external_time_name: ping_time_da}, 61 method=method,

~/.local/lib/python3.10/site-packages/xarray/core/dataarray.py in interp(self, coords, method, assume_sorted, kwargs, **coords_kwargs) 2294 "interp only works for a numeric type array. " f"Given {self.dtype}." 2295 ) -> 2296 ds = self._to_temp_dataset().interp( 2297 coords, 2298 method=method,

~/.local/lib/python3.10/site-packages/xarray/core/dataset.py in interp(self, coords, method, assume_sorted, kwargs, method_non_numeric, coords_kwargs) 3978 # For normal number types do the interpolation: 3979 var_indexers = {k: v for k, v in use_indexers.items() if k in var.dims} -> 3980 variables[name] = missing.interp(var, var_indexers, method, kwargs) 3981 elif dtype_kind in "ObU" and (use_indexers.keys() & var.dims): 3982 # For types that we do not understand do stepwise

~/.local/lib/python3.10/site-packages/xarray/core/missing.py in interp(var, indexes_coords, method, *kwargs) 634 original_dims = broadcast_dims + dims 635 new_dims = broadcast_dims + list(destination[0].dims) --> 636 interped = interp_func( 637 var.transpose(original_dims).data, x, destination, method, kwargs 638 )

~/.local/lib/python3.10/site-packages/xarray/core/missing.py in interp_func(var, x, new_x, method, kwargs) 751 ) 752 --> 753 return _interpnd(var, x, new_x, func, kwargs) 754 755

~/.local/lib/python3.10/site-packages/xarray/core/missing.py in _interpnd(var, x, new_x, func, kwargs) 775 # stack new_x to 1 vector, with reshape 776 xi = np.stack([x1.values.ravel() for x1 in new_x], axis=-1) --> 777 rslt = func(x, var, xi, **kwargs) 778 # move back the interpolation axes to the last position 779 rslt = rslt.transpose(range(-rslt.ndim + 1, 1))

~/.local/lib/python3.10/site-packages/scipy/interpolate/_rgi.py in interpn(points, values, xi, method, bounds_error, fill_value) 743 # perform interpolation 744 if method in RegularGridInterpolator._ALL_METHODS: --> 745 interp = RegularGridInterpolator(points, values, method=method, 746 bounds_error=bounds_error, 747 fill_value=fill_value)

~/.local/lib/python3.10/site-packages/scipy/interpolate/_rgi.py in init(self, points, values, method, bounds_error, fill_value, solver, solver_args) 283 self.values = self._check_values(values) 284 self._check_dimensionality(self.grid, self.values) --> 285 self.fill_value = self._check_fill_value(self.values, fill_value) 286 if self._descending_dimensions: 287 self.values = np.flip(values, axis=self._descending_dimensions)

~/.local/lib/python3.10/site-packages/scipy/interpolate/_rgi.py in _check_fill_value(self, values, fill_value) 336 np.can_cast(fill_value_dtype, values.dtype, 337 casting='same_kind')): --> 338 raise ValueError("fill_value must be either 'None' or " 339 "of a type compatible with values") 340 return fill_value

ValueError: fill_value must be either 'None' or of a type compatible with values

Thanks, mike

ctuguinay commented 1 month ago

Oh huh...this is strange. I think this may be a problem with NaNs in the environment variable(s) and doing linear interpolation with that, but I had a few CI test datasets with NaNs that passed my integration tests. I'll check this out later today. Thanks for the issue!

ctuguinay commented 1 month ago

@jmjech just sent a request for permission to view these RAW files

jmjech commented 1 month ago

@ctuguinay : you should have access now. let me know if not.

ctuguinay commented 1 month ago

@jmjech So there are a few things I observed -

Not sure if you already tried it, but it is possible to run compute_Sv on individual Echodata objects and combine the individual Sv datasets:

image

I was also able to replicate this harmonization error with the combined Echodata object and I figured out that it was the sound absorption variable that failed this interpolation:

image

This is most likely due to the sound absorption variable from the combined Echodata object already having ping time in it before harmonization with ping time has even occurred:

image

And turns out that dropping this ping time dimension results in the harmonization function passing:

image

So I need to figure out why this sound absorption variable even has the ping time dimension in the case when we are working on an Echodata object that is the combination of multiple Echodata objects (it should only have time1 as the time dimension). For now, I think the compute_Sv on individual Echodata objects and combining them post-compute_Sv is probably your best bet on getting it to run on your side of things, since it might take me a while to get back to this.

jmjech commented 1 month ago

@ctuguinay: Thanks for the tip on concatenating the data array/set! I will try that soon. Good luck with figuring it out.

leewujung commented 1 month ago

@jmjech : Chiming in here -- could we ask @spacetimeengineer to give this a shot? I think this will be a good way to get into the Echopype codebase, and @ctuguinay has done the investigative work to figure out where the problem lies.

jmjech commented 1 month ago

@leewujung : I thought about that. He is currently working on issue #995 to get into the code, so I thought I wouldn't distract him. But, I agree that this is a good issue to dig even more into the code. I spent a day working on it and it helped me understand the flow of the program.