OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

Write function to define xarray in ICES netCDF4 format #6

Closed leewujung closed 5 years ago

leewujung commented 6 years ago

This function will create a common framework for unpacking functions of different file formats to save data into.

Ref:

erinann commented 6 years ago

From the OHW2018/echopype repo: @friedrichknuth suggested: can learn from this parser to see what they do with saving data to .nc file.

This was issue number 4 in OHW18_echopype. (Can't use # or it will link to new issue #).

leewujung commented 5 years ago

In implementing the EK60 .raw to .nc convertor (PR #18) I felt that the SONAR-netCDF4 convention as it is right now it's not convenient for taking advantage of the xarray capability. The main thing is that the convention was written for horizontal fisheries sonar like SX90, so under the Sonar group there are subgroups like Sonar/Beam_group1, Sonar/Beam_group2, etc., and each beam subgroup is more like different frequency channels in echosounder data. Let's say if the bin sizes are chosen to be the same in the echosounders, it is way more convenient to store the backscattering data as a 3D array with dimensions (frequency, ping_time, range_bin) so that we can index each dimension easily using xarray. The convertor I just finished right now uses the (frequency, ping_time, range_bin) structure.

@erinann @marianpena The example files I got from Dyson were set up this way (the bin sizes are the same for all frequencies) and so were the OOI files. Is this in general true when people set things up in the field? for EK60, EK80, and AZFP?

leewujung commented 5 years ago

So I had an email exchange with @marianpena, it seems that for raw data it is indeed better to have the flexibility to allow different number of range bins for different channels (transducers), especially since for broadband data it is often the case that you'd record out to different range with different frequencies, and perhaps with different bin sizes too.

I mainly want to enable using the capability of xarray to slice and do math operations across multi-dimensional data sets directly from the files without having to load everything into memory. Perhaps this is better done at another level of either .nc file such as with the calibrated Sv or MVBS (mean volume backscattering strength), since then it is more likely that the range bin would be the same and NaNs can be used to fill in missing data at longer ranges not reached by the higher frequencies.

This perhaps worth raising another issue to discuss, in terms of different levels of files. For raw data it is advantageous to keep things flexible.

leewujung commented 5 years ago

I decided to focus on getting the machinery running first without worrying too much about the exact format of the Beam group variables:

I am going to close this and raise a separate issue for a later enhancement for