OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
96 stars 72 forks source link

BI500 `open_raw` outputs #1258

Open leewujung opened 8 months ago

leewujung commented 8 months ago

This is from discussions with @praneethratna and @jmjech for adding support for BI500 data. For context, after BI500 support is added we could move forward to the closely related EK500 data.

Different from the raw data contained in EK60 and EK80, BI500 files contain nominally "calibrated" Sv data, which came from applying a suite of parameters (through a parameter file) that users put in to configure EK500, and the echosounder applies these parameters to the raw data to output BI500 files.

This creates a situation where the parsed BI500 data are not "raw" in itself and are more equivalent to the Sv datasets generated from compute_Sv. In theory we could leverage the SONAR-netCDF4 v2.0 generalization to store the nominally calibrated Sv data into the backscatter_r variable, but this may cause confusion with other echosounders, for which backscatter_r contains uncalibrated data.

For the first shot we can do the following:

echodata, ds_cal = open_raw(raw_file="BI500_FILEPATH", ..., metadata_file="METADATA. ekprm")

The key things are:

@praneethratna and @jmjech: Please feel free to point out anything incorrect and other things I may have missed!

jmjech commented 7 months ago

I'm working with a professor at UNH and he was thoroughly confused about the backscatter_r variable. He assumed there would be backscatter_i, from which he could calculate the magnitude. He is using EK60 data. That's a larger issue, but @leewujung suggestion to separate into two output sets is a good start. The parameter file used by the EK500 is a text file, but we could consider converting to xml, json, or yaml.

leewujung commented 7 months ago

I'm working with a professor at UNH and he was thoroughly confused about the backscatter_r variable. He assumed there would be backscatter_i, from which he could calculate the magnitude. He is using EK60 data. That's a larger issue, but @leewujung suggestion to separate into two output sets is a good start.

The backscatter_r and backscatter_i are directly from the SONAR-netCDF4 convention version 1. We did not try to change that since we already made other changes that we think are more critical to the data use. We should bring this confusion and renaming proposal to the ICES community.

The parameter file used by the EK500 is a text file, but we could consider converting to xml, json, or yaml.

I think we should just load the parameter files as is and put the parameters as data variables in the EchoData object. This is what we have done for the AZFP parameters that are stored in XML files that are separate from the data files. This way the EchoData object and the corresponding serialized Zarr/netcdf file contains as much of the recorded and ancillary data as possible in a single entity. This type of data consolidation, like adding missing GPS data to the EchoData object (which we provide a function for), IMHO is critical when we are dealing with larger and larger datasets.

Users can then use xarray functionality to export desired parameters to other common file formats.