OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
95 stars 73 forks source link

Method to add missing GPS data to platform group #201

Closed leewujung closed 2 years ago

leewujung commented 3 years ago

Some EK80 data sets come without the NMEA datagrams due to hardware config variations. This is a case similar to #198 in which some environmental data are also missing in AZFP files. Let's first deal with the case when the latitude/longitude data are saved in netCDF or zarr files.

Goal

Add a method to enable adding ancillary lat/lon data into converted files, when the source data file (recorded by the instrument) does not contain these data.

Task

Note: this functionality should be added to the class-redesign branch.

leewujung commented 3 years ago

@imranmaj : could you start tackling the first task as a standalone function? check out set_platform() in /echopypt/convert/set_groups_ek80.py to see how the coordinates and attributes are encoded; the idea here is to substitute (may actually be overwrite) the NaN variables to actual values from the GPS files. I'll point you to where the test files are on slack.

@ngkavin : I think the second task would require collaboration between you and @imranmaj. Could you suggest how the function argument should look like and where it should be called in the conversion sequence, and the 3 of us can discuss as a group?

Thanks :)

ngkavin commented 3 years ago

Hi @imranmaj, glad to have you helping to develop echopype .

It seems like this function really only requires 1 input argument although more would accommodate more use-cases. That argument being the list of .nc/.zarr files that contain the platform information. This function could go anywhere between creating the Convert object and calling to_netcdf. For example:

tmp= Convert(ek80_raw_path, model="EK80")
tmp.add_platform_data(files)
tmp.set_param(params)
tmp.to_netcdf()

The base functionality could be splitting the GPS data while saving each .nc file, but more use-cases could involve adding the GPS data after everything has been combine so that add_platform_dict would go after to_netcdf(combine=True).

imranmaj commented 3 years ago

Hi! Thanks for the help.

I noticed that there's an attribute named extra_files on the Convert class. Is that attribute intended for this purpose?

ngkavin commented 3 years ago

No. Some EK80 raw files contain broadband as well as continuous wave backscatter data which are not saved in the same NetCDF file. In this case, the extra_files are used for keeping track of the new '_cw.nc' files that are created in the conversion process.

I will probably rename the variable to cw files to make it clear that it is only used for this purpose.

imranmaj commented 3 years ago

Thanks!

When I try to open the files using DataSet.open_mfdataset, I get the following error:

ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation

I believe this is because internally, DataSet.open_mfdataset calls xarray.combine_by_coords, which says

If it cannot determine the order in which to concatenate the datasets, it will raise a ValueError

It looks like the obs dimension does not have a dimension coordinate. Would I be correct in assuming that I need to add a dimension coordinate to the obs dimension (probably with the preprocess keyword argument on open_mfdataset)?

ngkavin commented 3 years ago

I don't know what your GPS files look like, nor do I know what the obs dimension is. But usually you would want to concatenate on a time dimension. I don't think you would need to add dimensions because your files should already have the necessary dimensions so that it could be saved to a .nc file. Have you tried specifying combine='nested', concat_dim='name of time dimension' in open_mfdataset?

leewujung commented 3 years ago

@ngkavin : I'll send you link to the files, it's in our shared drive.

ngkavin commented 3 years ago

xr.open_mfdataset(files, combine='nested', concat_dim='obs')

ngkavin commented 3 years ago

Accidently closed, but xr.open_mfdataset(files, combine='nested', concat_dim='obs') works.

imranmaj commented 3 years ago

Ah, I see, thank you. I was trying to combine by_coords

leewujung commented 2 years ago

This is supposed to closed long time ago. :)