Closed leewujung closed 2 years ago
@imranmaj : could you start tackling the first task as a standalone function? check out set_platform()
in /echopypt/convert/set_groups_ek80.py
to see how the coordinates and attributes are encoded; the idea here is to substitute (may actually be overwrite) the NaN variables to actual values from the GPS files. I'll point you to where the test files are on slack.
@ngkavin : I think the second task would require collaboration between you and @imranmaj. Could you suggest how the function argument should look like and where it should be called in the conversion sequence, and the 3 of us can discuss as a group?
Thanks :)
Hi @imranmaj, glad to have you helping to develop echopype .
It seems like this function really only requires 1 input argument although more would accommodate more use-cases. That argument being the list of .nc/.zarr files that contain the platform information. This function could go anywhere between creating the Convert
object and calling to_netcdf
.
For example:
tmp= Convert(ek80_raw_path, model="EK80")
tmp.add_platform_data(files)
tmp.set_param(params)
tmp.to_netcdf()
The base functionality could be splitting the GPS data while saving each .nc file, but more use-cases could involve adding the GPS data after everything has been combine so that add_platform_dict
would go after to_netcdf(combine=True)
.
Hi! Thanks for the help.
I noticed that there's an attribute named extra_files
on the Convert
class. Is that attribute intended for this purpose?
No. Some EK80 raw files contain broadband as well as continuous wave backscatter data which are not saved in the same NetCDF file. In this case, the extra_files
are used for keeping track of the new '_cw.nc' files that are created in the conversion process.
I will probably rename the variable to cw files
to make it clear that it is only used for this purpose.
Thanks!
When I try to open the files using DataSet.open_mfdataset
, I get the following error:
ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation
I believe this is because internally, DataSet.open_mfdataset
calls xarray.combine_by_coords
, which says
If it cannot determine the order in which to concatenate the datasets, it will raise a ValueError
It looks like the obs
dimension does not have a dimension coordinate. Would I be correct in assuming that I need to add a dimension coordinate to the obs
dimension (probably with the preprocess
keyword argument on open_mfdataset
)?
I don't know what your GPS files look like, nor do I know what the obs
dimension is. But usually you would want to concatenate on a time dimension. I don't think you would need to add dimensions because your files should already have the necessary dimensions so that it could be saved to a .nc file. Have you tried specifying combine='nested'
, concat_dim='name of time dimension'
in open_mfdataset
?
@ngkavin : I'll send you link to the files, it's in our shared drive.
xr.open_mfdataset(files, combine='nested', concat_dim='obs')
Accidently closed, but xr.open_mfdataset(files, combine='nested', concat_dim='obs') works.
Ah, I see, thank you. I was trying to combine by_coords
This is supposed to closed long time ago. :)
Some EK80 data sets come without the NMEA datagrams due to hardware config variations. This is a case similar to #198 in which some environmental data are also missing in AZFP files. Let's first deal with the case when the latitude/longitude data are saved in netCDF or zarr files.
Goal
Add a method to enable adding ancillary lat/lon data into converted files, when the source data file (recorded by the instrument) does not contain these data.
Task
[ ] Add a method
add_platform_data()
to theConvert
class that allows users to specify one or more nc/zarr files (will call this GPS files below) that can be opened by xarray and contain variables namedlatitude
andlongitude
, and save the lat/lon toPlatform
group. The method should:ping_time
start and end of the acoustic data in theBeam
group, and[ ] Make the
add_platform_data()
function work with either whole bunch of individual files, each individually converted to nc/zarr, or a single combined output file from a list of individual files (i.e., thecombine=True
option).Note: this functionality should be added to the
class-redesign
branch.