OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
89 stars 70 forks source link

OOI EK80 CW Power Angle, Sv returns all NULL values #1260

Closed oceanzus closed 2 months ago

oceanzus commented 5 months ago

I'm running a modified version of our EK60 code on the new EK80 CW Power Angle data files without issue up to the point where the Sv parameter is being computed.

The issue seems to be in the statement below: echopype.calibrate.compute_Sv(ds, env_params=env_params, waveform_mode='CW', encode_mode='power') where env_params is based on: 'CE02SHBP_TEST': { 'long_name': 'Coastal Endurance, Oregon Shelf Cabled Benthic Experiment Package', 'tilt_correction': 0, 'colorbar_range': [-90, -50], 'vertical_range': [0, 80], 'deployed_depth': 82, 'depth_offset': 2.0, 'average_salinity': 33, 'average_temperature': 10, 'instrument_orientation': 'up' } and ds is the dataset opened using: ds = echopype.open_raw(file, sonar_model='EK80')

Any suggestions on debugging steps and/or solutions that might be in the pipeline for future releases?

leewujung commented 5 months ago

Hey @oceanzus: Could you please provide a notebook as a gist so that we can reproduce the error with the exact files and parameters you use? Thanks.

leewujung commented 4 months ago

@oceanzus : Pinging again to see if you have any updates here.

oceanzus commented 4 months ago

Example files are located here: https://rawdata.oceanobservatories.org/files/CE02SHBP/MJ01C/ZPLSCB101_10.33.13.7/2024/03/04/

The raw files are being opened with open_raw and sonar_model="EK80"

The calibrate.compute_Sv on the opened dataset uses waveform_mode='CW' and encode_mode='power'

ctuguinay commented 3 months ago

Hey @oceanzus, thanks for sharing the data!

This issue is related to #1287 and #743, both of which are issues that I will work on / continue working on soon. Using open_raw on the data you shared, I noticed that the individual ping times for the 3 channels (from the raw EK80 datagram) were not aligned. I am guessing that this misalignment is due to ping time multiplexing. This causes the merging of the channel values to have padded NaNs, which downstream will cause the NaNs you see in compute_Sv. There are however still non-NaN values in the non-calibrated backscatter data and the calibrated backscatter data: About a 1/4th non-NaN and 3/4th NaN split.

I have a gist that briefly expands on this: https://gist.github.com/ctuguinay/2a92f52129b3045147d66fbf2f071ec6

@leewujung From my current understanding, I know that the padded NaNs are necessary and intended for channel merging, but do you think a warning should be raised for these padded NaNs that exist because of datagram ping time misalignment? It is also tricky to work with the Sv data where every other water column (ping-time-wise column) is NaN and each channel has a differing NaN / non-NaN switching pattern. If this is an intended feature to have for datagrams with misaligned ping time values, what do you think could be added to make working with this data easier? Or should we leave this post-processing task of cleaning up padded NaNs to the user?

leewujung commented 2 months ago

@ctuguinay : This is not an error, the instrument was intentionally configured to ping sequentially for various reasons. The reason why I mentioned the 2-stage approach in https://github.com/OSOceanAcoustics/echopype/issues/1287#issuecomment-2041653875 is precisely that we can first make the method work for handling the NaNs, and then at the second stage figure out the potential memory and storage issue (though NaN compresses really well, and if we use zarr allocation directly like what we have in open_raw, we can probably circumvent the memory issue.

One thing we can consider is to store data from different channels with sequential pings to different data arrays, but that would break the convenience for slicing. At a higher processing level after Sv at raw ping time resolution is computed, people often try to regrid to bring all observations to the same ping time grid, so it would not be an issue there. So, yeah, this is something to discuss later.

ctuguinay commented 2 months ago

@oceanzus Here's another gist illustrating how to extract and plot calibrated Sv data with these padded NaNs. It also illustrates the pattern of pinging that the 3 transducers take in this multiplexed system. https://gist.github.com/ctuguinay/128bdd73ae286fcc63f5e71d32041431

After talking today, we currently believe that these padded NaNs that you see in the calibrated Sv are not an issue with the code, and are actually intended and necessary for selection functionality. I'll close this issue for now, but let us know if this causes any other downstream issues in your workflow. This interweaving of NaN / non-NaNs is tricky to work with, and I hope the short gist I shared can help you move forward.