OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

"Processed" power data in EK60/80 `backscatter_r` #643

Closed leewujung closed 1 year ago

leewujung commented 2 years ago

Currently a minimum processing is applied in the EK parser with the power data to convert the counts to power. https://github.com/OSOceanAcoustics/echopype/blob/bc8afa190fa2ca4fab5944bac83cd4b20f7abcf6/echopype/convert/parse_base.py#L122-L140

This means that the power data we stored as backscatter_r are technically "processed" data, which conflicts a bit with our intention to store the most "raw" form of data in at level 0.

@emiliom: If we change this and actually save just the counts in backscatter_r and move the scaling to part of compute_Sv, does it still work with the convention specification "Real part or amplitude or power of backscatter measurements." ?

This would be a breaking change since it changes the content of data in open_raw and have downstream impact in compute_Sv. We can mitigate the latter by including this scaling in the set of changes for v0.5.x --> v0.6.0 conversion (#606). The breaking aspect though does mean that we need to include this in v0.6.0.

leewujung commented 2 years ago

When we consider this revisit putting in detail of what this is in variable attribute.

emiliom commented 1 year ago

Circling back to this. It's a good question / observation. I've pasted below, for reference, the complete SONAR-netCDF4 v1 information about backscatter_r.

In general, I agree with your point about counts being more appropriate as rawer, less processed data. But looking at the code, the conversion to power is based on a simple, fixed constant (INDEX2POWER) that's not dependent on any variable. So, in this case the distinction between "power" and "counts" seems very small, from a processing perspective.

I think there are other factors to consider, as we make a decision about this:

When we consider this revisit putting in detail of what this is in variable attribute.

Agreed. It could go into a comment attribute. BTW, the convention specifies the long_name "Raw backscatter measurements (real part)". I realize "Backscatter power" is more specific and user friendly, but we should probably revisit this.

Description Obligation Comment
sample_t backscatter_r(ping_time, beam) M Real part or amplitude or power of backscatter measurements. Each element in the 2D matrix is a variable length vector (of type sample_t) that contains the samples for that beam and ping time.
:long_name = "Raw backscatter measurements (real part)"
:units = "as appropriate" Use units appropriate for the data
emiliom commented 1 year ago

We decided to:

emiliom commented 1 year ago

I believe we can close this issue, now that #1047 is merged.