OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
99 stars 76 forks source link

Errors/issues processing sequentially pinged multifrequency broadband EK80 files #743

Closed brandynlucca closed 7 months ago

brandynlucca commented 2 years ago

Hi all. I didn't see this in either of the current open or closed issues, but I am running into a data processing issue with Simrad *.raw files containing broadband multiple frequency data. I followed the threads and chain of issues/pulls in both #308 and #657, but those didn't seem to directly help address the issues I was encountering. I apologize if I missed a thread that does, in fact, address this exact issue. Namely, some of the issues addressed between 0.5.6 and 0.6.0 didn't appear to resolve errors involving changing data values for transmit_power, slope, transmit_duration_nominal, frequency_start, and frequency_end.

Background

OS: Windows 10.0.19043 Python version: 3.7.9 and 3.8.10 Echopype version: 0.5.6 and 0.6.0 Simrad EK80 software version: 2.0.0

ed["Provenance"]

<xarray.Dataset>
Dimensions:           (filenames: 1)
Dimensions without coordinates: filenames
Data variables:
    source_filenames  (filenames) <U57 'C:/Users/Brandyn/Downloads/NYOS2105-D...
Attributes:
    conversion_software_name:     echopype
    conversion_software_version:  0.6.0
    conversion_time:              2022-06-28T18:22:15Z

An aside: I ran Echopype 0.5.6 in Python 3.7.9 and Echopype 0.6.0 in Python 3.8.10. For Python 3.7.x, there is an error in trying to install Echopype 0.6.0 due to certain dependencies requiring Python >= 3.8 (namely xarray). So the recommended Python version in the Installation Guide may need to be amended for Windows. This was detected via:

pip install git+https://github.com/OSOceanAcoustics/echopype.git#egg=echopype

Issue I am trying to call in EK80 raw files that includes four frequencies (38, 70, 120, 200 kHz) where the 38 and 200 kHz come from a dual frequency transducer (38/200 Combi D). All four comprise broadband/complex measurements. The fileset I am currently processing is from May 2021. I haven't run into any significant issues when trying to open the raw files. However, I do run into an issue when trying to compute Sv or attempt really any other processing function. This also includes trying to visualize the echogram. I have been able to successfully run the examples from the provided notebook using EK60 data.

The issue appears to be related to how some of the arrays for certain variables are mapped to the frequency and ping_time coordinates. So the initial Error message I receive is as follows (Note: ultimately, the flagged error is identical in both Echopype 0.5.6 and Echopype 0.6.0):

import echopype as ep
from echopype import open_raw
import numpy as np

file = "C:/Users/Brandyn/Downloads/NYOS2105-D20210525-T213648.raw"
ed = open_raw(file, sonar_model='EK80')
ed_sv = ep.calibrate.compute_Sv(ed, waveform_mode="BB", encode_mode="complex")

Resulting error

  File "<stdin>", line 1, in <module>
  File "C:\Users\Brandyn\AppData\Roaming\Python\Python38\site-packages\echopype\calibrate\api.py", line 193, in compute_Sv
    return _compute_cal(cal_type="Sv", echodata=echodata, **kwargs)
  File "C:\Users\Brandyn\AppData\Roaming\Python\Python38\site-packages\echopype\calibrate\api.py", line 81, in _compute_cal
    cal_ds = cal_obj.compute_Sv(waveform_mode=waveform_mode, encode_mode=encode_mode)
  File "C:\Users\Brandyn\AppData\Roaming\Python\Python38\site-packages\echopype\calibrate\calibrate_ek.py", line 964, in compute_Sv
    return self._compute_cal(
  File "C:\Users\Brandyn\AppData\Roaming\Python\Python38\site-packages\echopype\calibrate\calibrate_ek.py", line 930, in _compute_cal
    ds_cal = self._cal_complex(cal_type=cal_type, waveform_mode=waveform_mode)
  File "C:\Users\Brandyn\AppData\Roaming\Python\Python38\site-packages\echopype\calibrate\calibrate_ek.py", line 684, in _cal_complex
    chirp, _, tau_effective = self.get_transmit_chirp(waveform_mode=waveform_mode)
  File "C:\Users\Brandyn\AppData\Roaming\Python\Python38\site-packages\echopype\calibrate\calibrate_ek.py", line 549, in get_transmit_chirp
    raise TypeError("File contains changing %s!" % p)
TypeError: File contains changing transmit_duration_nominal!

Looking under the hood in beam from my EchoData object (I used the equivalent indexing notation, e.g. ed.beam, for the Echopype 0.5.6 release):

print(ed['Sonar/Beam_group1'])

<xarray.Dataset>
Dimensions:                        (channel: 4, ping_time: 361, beam: 4,
                                    range_sample: 16729)
Coordinates:
  * channel                        (channel) <U36 'WBT 400039-15 ES120-7C_ES'...
  * ping_time                      (ping_time) datetime64[ns] 2021-05-25T21:3...
  * range_sample                   (range_sample) int64 0 1 2 ... 16727 16728
  * beam                           (beam) <U1 '1' '2' '3' '4'
Data variables: (12/21)
    frequency_nominal              (channel) float64 1.2e+05 7e+04 3.8e+04 2e+05
    beam_type                      (channel, ping_time) int32 1 1 1 1 ... 0 0 0
    beamwidth_twoway_alongship     (channel, ping_time, beam) float64 7.0 ......
    beamwidth_twoway_athwartship   (channel, ping_time, beam) float64 7.0 ......
    beam_direction_x               (channel, ping_time, beam) float64 0.0 ......
    beam_direction_y               (channel, ping_time, beam) float64 0.0 ......
    ...                             ...
    frequency_start                (channel, ping_time, beam) float64 nan ......
    frequency_end                  (channel, ping_time, beam) float64 nan ......
    sample_interval                (channel, ping_time) float64 nan ... 1.6e-05
    transmit_power                 (channel, ping_time) float64 nan ... 200.0
    transmit_duration_nominal      (channel, ping_time) float32 nan ... 0.000512
    slope                          (channel, ping_time) float64 nan ... 0.0514
Attributes:
    beam_mode:              vertical
    conversion_equation_t:  type_3
print(ed['Sonar/Beam_group1']['transmit_duration_nominal'])

<xarray.DataArray 'transmit_duration_nominal' (channel: 4, ping_time: 361)>
array([[     nan,      nan, 0.000512, ...,      nan, 0.000512,      nan],
       [     nan, 0.000512,      nan, ..., 0.000512,      nan,      nan],
       [0.000512,      nan,      nan, ...,      nan,      nan, 0.000512],
       [0.000512,      nan,      nan, ...,      nan,      nan, 0.000512]],
      dtype=float32)
Coordinates:
  * channel    (channel) <U36 'WBT 400039-15 ES120-7C_ES' ... 'WBT Mini 26118...
  * ping_time  (ping_time) datetime64[ns] 2021-05-25T21:36:48.192999936 ... 2...
Attributes:
    long_name:  Nominal bandwidth of transmitted pulse
    units:      s
    valid_min:  0.0

I wasn't sure if the padded nan values here were related to #681, but when checking whether there is a different transmit duration included in this array:

ed['Sonar/Beam_group1']['transmit_duration_nominal'].shape
(4, 361)
np.unique(ed['Sonar/Beam_group1']['transmit_duration_nominal'].values)
array([0.000512,      nan], dtype=float32)

It makes sense since pings were sequential so the ping_time coordinate won't include actual data for each frequency. Looking at the first 6 pings:

ed['Sonar/Beam_group1']["transmit_duration_nominal"].values[:, 0:6]

array([[0.000512,      nan,      nan, 0.000512,      nan,      nan],
       [     nan, 0.000512,      nan,      nan, 0.000512,      nan],
       [     nan,      nan, 0.000512,      nan,      nan, 0.000512],
       [0.000512,      nan,      nan, 0.000512,      nan,      nan]],
      dtype=float32)

So that matches what would be expected since the 38/200 pinged simultaneously followed by the 70 and 120 kHz transducers, respectively. So, presumably, this should be the same for other variables mapped with the same coordinates (i.e. frequency and ping_time), which it is (e.g. slope):

ed['Sonar/Beam_group1']["slope"].values[:, 0:6]
array([[0.223214 ,       nan,       nan, 0.223214 ,       nan,       nan],
       [      nan, 0.0710227,       nan,       nan, 0.0710227,       nan],
       [      nan,       nan, 0.0434028,       nan,       nan, 0.0434028],
       [0.051398 ,       nan,       nan, 0.051398 ,       nan,       nan]])

So when I just replace the nan placeholders with the same float32 value, the issue gets passed onto the next variable (so slope in this case):

beam_tdn_copy = ed['Sonar/Beam_group1']['transmit_duration_nominal'].copy()
tdn_vals = np.unique(beam_tdn_copy.values.flatten(), return_index=True)
tdn_vals = tdn_vals[0][tdn_vals[1].argsort()]
tdn_vals = tdn_vals[~np.isnan(tdn_vals)]
ed['Sonar/Beam_group1']['transmit_duration_nominal'].values = np.full_like(beam_tdn_copy, tdn_vals[:, np.newaxis])
print(ed['Sonar/Beam_group1']["transmit_duration_nominal"])

<xarray.DataArray 'transmit_duration_nominal' (channel: 4, ping_time: 361)>
array([[0.000512, 0.000512, 0.000512, ..., 0.000512, 0.000512, 0.000512],
       [0.000512, 0.000512, 0.000512, ..., 0.000512, 0.000512, 0.000512],
       [0.000512, 0.000512, 0.000512, ..., 0.000512, 0.000512, 0.000512],
       [0.000512, 0.000512, 0.000512, ..., 0.000512, 0.000512, 0.000512]],
      dtype=float32)
Coordinates:
  * channel    (channel) <U36 'WBT 400039-15 ES120-7C_ES' ... 'WBT Mini 26118...
  * ping_time  (ping_time) datetime64[ns] 2021-05-25T21:36:48.192999936 ... 2...
Attributes:
    long_name:  Nominal bandwidth of transmitted pulse
    units:      s
    valid_min:  0.0
ed_sv = ep.calibrate.compute_Sv(ed, waveform_mode='BB', encode_mode='complex')
TypeError: File contains changing slope!

Changing slope:

print(ed['Sonar/Beam_group1']["slope"])

<xarray.DataArray 'slope' (channel: 4, ping_time: 361)>
array([[      nan,       nan, 0.0434028, ...,       nan, 0.0434028,
              nan],
       [      nan, 0.0710227,       nan, ..., 0.0710227,       nan,
              nan],
       [0.223214 ,       nan,       nan, ...,       nan,       nan,
        0.223214 ],
       [0.051398 ,       nan,       nan, ...,       nan,       nan,
        0.051398 ]])
Coordinates:
  * channel    (channel) <U36 'WBT 400039-15 ES120-7C_ES' ... 'WBT Mini 26118...
  * ping_time  (ping_time) datetime64[ns] 2021-05-25T21:36:48.192999936 ... 2...
slope_vals = np.unique(ed['Sonar/Beam_group1']["slope"].values.flatten(), return_index=True)
slope_vals = slope_vals[0][slope_vals[1].argsort()]
slope_vals = slope_vals[~np.isnan(slope_vals)]
ed['Sonar/Beam_group1']["slope"].values = np.full_like(ed['Sonar/Beam_group1']["slope"], slope_vals[:, np.newaxis])
print(ed['Sonar/Beam_group1']["slope"])

<xarray.DataArray 'slope' (channel: 4, ping_time: 361)>
array([[0.0434028, 0.0434028, 0.0434028, ..., 0.0434028, 0.0434028,
        0.0434028],
       [0.0710227, 0.0710227, 0.0710227, ..., 0.0710227, 0.0710227,
        0.0710227],
       [0.223214 , 0.223214 , 0.223214 , ..., 0.223214 , 0.223214 ,
        0.223214 ],
       [0.051398 , 0.051398 , 0.051398 , ..., 0.051398 , 0.051398 ,
        0.051398 ]])
Coordinates:
  * channel    (channel) <U36 'WBT 400039-15 ES120-7C_ES' ... 'WBT Mini 26118...
  * ping_time  (ping_time) datetime64[ns] 2021-05-25T21:36:48.192999936 ... 2...

Same error, just for slope now:

ed_sv = ep.calibrate.compute_Sv(ed, waveform_mode='BB', encode_mode='complex')
TypeError: File contains changing transmit_power!

Repeating this by replacing the nan placeholders in transmit_power, frequency_start, and frequency_end (the latter two required a slightly different brute force approach since they apparently have a different dimension than the other data variables listed above) appears to resolve the issue since functions like compute_Sv(...) will now operate as expected:

ed_sv = ep.calibrate.compute_Sv(ed, waveform_mode='BB', encode_mode='complex')
print(ed_sv)
<xarray.Dataset>
Dimensions:                (ping_time: 361, channel: 4, range_sample: 16729,
                            filenames: 1, time3: 1)
Coordinates:
  * ping_time              (ping_time) datetime64[ns] 2021-05-25T21:36:48.192...
  * channel                (channel) <U36 'WBT 400039-15 ES120-7C_ES' ... 'WB...
  * range_sample           (range_sample) int64 0 1 2 3 ... 16726 16727 16728
  * time3                  (time3) datetime64[ns] 2021-05-25T21:36:48.192999936
Dimensions without coordinates: filenames
Data variables: (12/13)
    Sv                     (channel, ping_time, range_sample) float64 nan ......
    echo_range             (channel, ping_time, range_sample) float64 nan ......
    frequency_nominal      (channel) float64 1.2e+05 7e+04 3.8e+04 2e+05
    temperature            (ping_time) float64 10.0 nan nan nan ... nan nan nan
    salinity               (ping_time) float64 35.0 nan nan nan ... nan nan nan
    pressure               (ping_time) float64 100.0 nan nan nan ... nan nan nan
    ...                     ...
    sound_absorption       (ping_time, channel) float64 0.04094 0.02458 ... nan
    sa_correction          object None
    gain_correction        object None
    equivalent_beam_angle  (channel, ping_time) float64 -20.7 -20.7 ... -12.6
    source_filenames       (filenames) <U57 'C:/Users/Brandyn/Downloads/NYOS2...
    water_level            (time3) float64 0.0
Attributes:
    processing_software_name:     echopype
    processing_software_version:  0.6.0
    processing_time:              2022-06-28T18:33:22Z
    processing_function:          calibrate.compute_Sv

But this hack to brute force compute_Sv(...) unfortunately appears to potentially break other features (or this may be just a general 'feature' on how these *.raw files are formatted regardless of the above hack). For instance, I am able to generate the unbinned Sv echograms without much trouble (attached is the related figure for the 70 kHz):

import matplotlib.pyplot as plt 
import echopype.visualize as epviz

gg = epviz.create_echogram(ed_sv, 
                      robust=True, 
                      get_range=True, 
                      range_kwargs=dict(waveform_mode='BB', encode_mode='complex'),
                      vmin=-70, 
                      vmax=-36)
plt.show(gg[1])
plt.close()

Lucca_echogram_test_plot_unbinned_70kHz

I can manually toggle through each of them to pull the channel/frequency I would like. But as soon as I average by bin via compute_MVBS(...), things seem to break for all frequencies except 38 kHz where the echogram is able to be generated.

ed_mvbs = ep.preprocess.compute_MVBS(ed_sv, range_meter_bin=1, ping_time_bin='10S')
gg = epviz.create_echogram(ed_mvbs, 
                      robust=True, 
                      get_range=True, 
                      range_kwargs=dict(waveform_mode='BB', encode_mode='complex'),
                      vmin=-70, 
                      vmax=-36)
plt.show(gg[2])
plt.close()

Lucca_echogram_test_plot_binnedmvbs_70kHz

It's certainly quite possible that I broke something in the process and that is what is triggering the above issue, but I have replicated the exact issue and 'hack' in 10 unique EK *.raw files from other cruises where we also collected broadband measurements with sequential pings. Any guidance would be greatly appreciated!

leewujung commented 2 years ago

@brandynlucca : Thanks for reporting the bugs and the very thorough investigation!! I believe you didn't break anything and it is the code that needs improvements.

The NaN padding is a strategy we use to get data from all channels into a gridded form for slicing, and it is great (!) to know that the current calibration routine breaks with the interleaving NaNs for sequential pings. I quickly looked into the section where things broke and saw a TODO for exactly the problem you saw: https://github.com/OSOceanAcoustics/echopype/blob/12693a72badcf54a18408e694b72e7b0e6d48023/echopype/calibrate/calibrate_ek.py#L528-L529

I will need probably until the weekend to get to this. If you wouldn't mind sharing a small example file that will really help with changing/debugging the implementation. This gives me a push to look into the EK80 cal code too, so thanks! (I think in addition to this issue the current implementation is also pretty slow due to the mechanism used to do pulse compression)

For the MVBS problem, it is a regression bug at v0.6.0 that should be fixed in #736 (but not merged yet). Could you give it a try to see if that resolved the problem?

Also, thanks for pointing out the >=3.8 problem. We'll get that fixed!

brandynlucca commented 2 years ago

@brandynlucca : Thanks for reporting the bugs and the very thorough investigation!! I believe you didn't break anything and it is the code that needs improvements.

The NaN padding is a strategy we use to get data from all channels into a gridded form for slicing, and it is great (!) to know that the current calibration routine breaks with the interleaving NaNs for sequential pings. I quickly looked into the section where things broke and saw a TODO for exactly the problem you saw:

https://github.com/OSOceanAcoustics/echopype/blob/12693a72badcf54a18408e694b72e7b0e6d48023/echopype/calibrate/calibrate_ek.py#L528-L529

I will need probably until the weekend to get to this. If you wouldn't mind sharing a small example file that will really help with changing/debugging the implementation. This gives me a push to look into the EK80 cal code too, so thanks! (I think in addition to this issue the current implementation is also pretty slow due to the mechanism used to do pulse compression)

No worries! I wonder if one plausible alternative would be to generate multidimensional coordinates that maps ping_time to a specific channel? Perhaps that would also allow frequency-specific data to be severable from the overall dataset (e.g. processing just 38/200 kHz backscatter via Echopype for something like dB differencing without requiring external manipulation or needing to process other unused frequency data that will just chew up memory and resources)?

In terms of files, I unfortunately only have 100 MB RAW files, but I can upload on to my Google drive and share it if you'd like!

For the MVBS problem, it is a regression bug at v0.6.0 that should be fixed in #736 (but not merged yet). Could you give it a try to see if that resolved the problem?

I tested the dataflow by installing the updated version included in your repo but that didn't seem to fix my particular issue it seems. So I am able to generate the 38 kHz echograms without issue, but the other three frequencies still aren't bin-averaging correctly. I wonder if this is related to the 38 kHz data occupying both the first non-nan-padded column (i.e. ping 0) and row? I know xarray can be particularly persnickety when it comes to reindexing things, so maybe that could explain why compute_MVBS(...) works for the first frequency channel and just happens to produce something for the other three since they share the same geometries by coincidence? Running down the rabbit hole a bit seems to indicate that this may still be related to how echo_range is being calculated since the issue clears up when brute-forcing the code to resample vertically by range_sample_num and horizontally by ping_time.

All that said, at least from the linked repo with the updated compute_MVBS(...) function I am actually able to manually construct echograms for the other frequencies by extracting their frequency-specific ping_time measurements with most of the same original code. The very unoptimized code below:

Original dataset

sv = 10 ** (ed_sv["Sv"]/10)
print(sv)
<xarray.DataArray 'Sv' (channel: 4, ping_time: 361, range_sample: 16729)>
array([[[           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
         3.91095702e-07, 9.14494970e-07, 1.85589463e-07],
        ...,
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
         2.49507722e-06, 1.44740865e-06, 2.45585001e-06],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan]],

       [[           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
...
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan]],

       [[           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        ...,
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan],
        [           nan,            nan,            nan, ...,
                    nan,            nan,            nan]]])
Coordinates:
  * ping_time     (ping_time) datetime64[ns] 2021-05-25T21:36:48.192999936 .....
  * channel       (channel) <U36 'WBT 400039-15 ES120-7C_ES' ... 'WBT Mini 26...
  * range_sample  (range_sample) int64 0 1 2 3 4 ... 16725 16726 16727 16728

I then glued together a bitmap function that subsets out the frequency-specific data's coordinates:

def _frequency_bitmap(sv_data, frequency): 
  #Add frequency as coordinate
  sv_data.coords["frequency"] = (
    ["channel"],
    ed_sv["frequency_nominal"].data
  )
  sv_data = sv_data.swap_dims({"channel": "frequency"})
  #Create bitmap of presence/absence of non-padded data
  bool_array = (~np.isnan(sv_data.sel(frequency=frequency).data)).sum(1)
  #Parse out ping times specific to frequency argument input
  ping_times_f = sv_data["ping_time"][bool_array > 0]
  #Select subset of original dataset
  return sv_data.loc[dict(frequency=frequency, ping_time=ping_times_f)]

So when applied to data to pull a specific frequency, the object ends up looking like:

#Set frequency choice 
frequency_choice = 7.0e+4
test_obj = _frequency_bitmap(sv, frequency_choice)
print(test_obj)
<xarray.DataArray 'Sv' (ping_time: 120, range_meter: 16729)>
array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       ...,
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]])
Coordinates:
  * ping_time     (ping_time) datetime64[ns] 2021-05-25T21:36:50.172000256 .....
    channel       <U36 'WBT 738184-15 ES70-7CD_ES'
    range_sample  (range_meter) int64 0 1 2 3 4 ... 16725 16726 16727 16728
  * range_meter   (range_meter) float64 0.0 0.0 0.0 0.0 0.0 ... nan nan nan nan
    frequency     float64 7e+04

So with that operationalized, I was able to generate the corresponding MVBS echogram for the other frequencies via:

#####
#Set frequency choice 
frequency_choice = 7.0e+4
#Set range bin (m)
range_meter_bin = 1.0 
#Set ping time bin (S)
ping_bin = "10S"
#####
sv = 10 ** (ed_sv["Sv"] / 10)
freq_ind_mask = np.unique(ds_Sv.frequency_nominal.values.flatten(), return_index=True)[0] == frequency_choice
freq_ind = np.unique(ds_Sv.frequency_nominal.values.flatten(), return_index=True)[1][freq_ind_mask]
sorted_freq_ind = np.argsort(ds_Sv.frequency_nominal)
sv.coords["range_meter"] = (
    ["range_sample"],
    ed_sv["echo_range"].isel(channel=sorted_freq_ind[freq_ind[0]], ping_time=freq_ind[0]).data,
)
sv = sv.swap_dims({"range_sample": "range_meter"})
range_interval = np.arange(0, ed_sv["echo_range"].max() + range_meter_bin, range_meter_bin)
rint = range_interval
pbin = ping_bin
sv_groupby_bins = (
  _frequency_bitmap(sv, frequency_choice)
  .groupby_bins("range_meter", bins=rint, right=False, include_lowest=True)
  .mean()
  .resample(ping_time=pbin, skipna=True)
  .mean()
)
sv_groupby_bins.coords["echo_range"] = (["range_meter_bins"], rint[:-1])
sv_groupby_bins = sv_groupby_bins.swap_dims({"range_meter_bins": "echo_range"})
sv_groupby_bins = sv_groupby_bins.drop_vars("range_meter_bins")
sv_groupby_bins = 10 * np.log10(sv_groupby_bins)
gg = sv_groupby_bins.plot.pcolormesh(x='ping_time', y='echo_range', cmap='jet', vmin=-80, vmax=-44)
plt.ylim(45, 2)
plt.show(gg)
plt.close()

Lucca Echopype_test_echogram_MVBS_single_frequency_70kHz

Hopefully there is a tidbit in there that may at least provide some extra inkling into the source of the issue with this dataset? Again, I can provide a link to the example file if needed.

leewujung commented 2 years ago

No worries! I wonder if one plausible alternative would be to generate multidimensional coordinates that maps ping_time to a specific channel? Perhaps that would also allow frequency-specific data to be severable from the overall dataset (e.g. processing just 38/200 kHz backscatter via Echopype for something like dB differencing without requiring external manipulation or needing to process other unused frequency data that will just chew up memory and resources)?

Yes! this is perhaps the nice use case for using dask.delayed via accessing already existing zarr store (so just slice out the frequencies needed) for distributing the computation. In a way once we are processing large files an intermediate zarr store may always be needed. In this particular case we could slice out just 38/200 and then drop all NaNs for the calculation. Great! I'll give this a try when revising the calibration code, perhaps it'll be some combination with groupby to make it channel-independent and delay-friendly.

In terms of files, I unfortunately only have 100 MB RAW files, but I can upload on to my Google drive and share it if you'd like!

Please do! 100 MB is not a problem for developing. I may be able to reduce the file size via replaying it through EK80 (if you have access to the EK80 software perhaps you could try that too), though I don't know if that may change some detailed info of the file (like software version etc) -- but I guess for using it as a test file in the CI for making sure the functionalities work it would be fine to have those different.

And thanks for providing the code that made things work! I am sure somewhere in there the issue is illuminated. I will look more closely in a couple of days.

brandynlucca commented 2 years ago

Please do! 100 MB is not a problem for developing. I may be able to reduce the file size via replaying it through EK80 (if you have access to the EK80 software perhaps you could try that too), though I don't know if that may change some detailed info of the file (like software version etc) -- but I guess for using it as a test file in the CI for making sure the functionalities work it would be fine to have those different.

Here is a Google Drive link to the *.raw file I was using:

https://drive.google.com/file/d/1c1d0rUgBa9N-2UfZOM7_rnCSvuhb0g0H/view?usp=sharing

Let me know if there are any issues accessing it!

leewujung commented 2 years ago

This issue might be related to https://github.com/OSOceanAcoustics/echopype/issues/720#issuecomment-1165568926

leewujung commented 2 years ago

@brandynlucca : we fixed a few regression bugs from v0.6.0 in the most recent release (v0.6.1 this morning), but I haven't had a chance to fix the part related to the NaN issues during BB calibration. Stay tuned -- I'll ping when there's PR for that.

petejan commented 1 year ago

I have a similar problem with WBAT-mini data (EK80 format), in CW mode,

>>> file='data/MultiFrequency-4min-38N200W-continuous38WN-200W-38N-SOFS-8-Phase0-D20190401-T000014-0.raw'
>>> ed = open_raw(file, sonar_model='EK80')
/Users/pete/DWM/echo/venv/lib/python3.8/site-packages/xarray/coding/times.py:228: RuntimeWarning: invalid value encountered in cast
  flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
/Users/pete/DWM/echo/venv/lib/python3.8/site-packages/xarray/coding/times.py:228: RuntimeWarning: invalid value encountered in cast
  flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
/Users/pete/DWM/echo/venv/lib/python3.8/site-packages/xarray/coding/times.py:228: RuntimeWarning: invalid value encountered in cast
  flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
>>> ed_sv = ep.calibrate.compute_Sv(ed, waveform_mode="CW", encode_mode="complex")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/api.py", line 205, in compute_Sv
    return _compute_cal(cal_type="Sv", echodata=echodata, **kwargs)
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/api.py", line 89, in _compute_cal
    cal_ds = cal_obj.compute_Sv(waveform_mode=waveform_mode, encode_mode=encode_mode)
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/calibrate_ek.py", line 997, in compute_Sv
    return self._compute_cal(
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/calibrate_ek.py", line 963, in _compute_cal
    ds_cal = self._cal_complex(cal_type=cal_type, waveform_mode=waveform_mode)
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/calibrate_ek.py", line 693, in _cal_complex
    chirp, _, tau_effective = self.get_transmit_chirp(waveform_mode=waveform_mode)
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/calibrate_ek.py", line 554, in get_transmit_chirp
    raise TypeError("File contains changing %s!" % p)
TypeError: File contains changing transmit_duration_nominal!
>>> print(ed['Sonar/Beam_group1'])
<xarray.Dataset>
Dimensions:                        (channel: 3, ping_time: 4, beam: 4,
                                    range_sample: 5750)
Coordinates:
  * channel                        (channel) <U29 'EKA 266958-0F ES38D' ... '...
  * ping_time                      (ping_time) datetime64[ns] 2019-04-01T00:0...
  * range_sample                   (range_sample) int64 0 1 2 ... 5747 5748 5749
  * beam                           (beam) <U1 '1' '2' '3' '4'
Data variables: (12/19)
    frequency_nominal              (channel) float64 3.8e+04 3.8e+04 2e+05
    beam_type                      (channel, ping_time) int64 1 1 1 1 ... 0 0 0
    beamwidth_twoway_alongship     (channel, ping_time, beam) float64 7.0 ......
    beamwidth_twoway_athwartship   (channel, ping_time, beam) float64 7.0 ......
    beam_direction_x               (channel, ping_time, beam) float64 nan ......
    beam_direction_y               (channel, ping_time, beam) float64 nan ......
    ...                             ...
    backscatter_r                  (channel, ping_time, range_sample, beam) float32 ...
    backscatter_i                  (channel, ping_time, range_sample, beam) float32 ...
    sample_interval                (channel, ping_time) float64 nan ... nan
    transmit_power                 (channel, ping_time) float64 nan ... nan
    transmit_duration_nominal      (channel, ping_time) float32 nan ... nan
    slope                          (channel, ping_time) float64 nan ... nan
Attributes:
    beam_mode:              vertical
    conversion_equation_t:  type_3
>>> 

Would an option be to have a channel option to calibrate.compute_Sv to process a single channel to Sv? to Combine them after they are put onto the same grid with compute_MVBS.

Attached 2 example data files MultiFrequency-WBAT-mini.zip

Get a different error with the current dev branch,

ds_Sv = ep.calibrate.compute_Sv(ed, waveform_mode='CW', encode_mode='complex')
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/code.py", line 90, in runcode
    exec(code, self.locals)
  File "<input>", line 1, in <module>
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/api.py", line 204, in compute_Sv
    return _compute_cal(cal_type="Sv", echodata=echodata, **kwargs)
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/api.py", line 58, in _compute_cal
    cal_ds = cal_obj.compute_Sv()
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/calibrate_ek.py", line 490, in compute_Sv
    return self._compute_cal(cal_type="Sv")
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/calibrate_ek.py", line 474, in _compute_cal
    ds_cal = self._cal_complex_samples(cal_type=cal_type, complex_ed_group=self.ed_group)
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/calibrate_ek.py", line 355, in _cal_complex_samples
    tx_coeff = get_filter_coeff(vend)
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/ek80_complex.py", line 158, in get_filter_coeff
    coeff[ch_id]["wbt_fil"] = get_vend_filter_EK80(vend, ch_id, "WBT", "coeff")
  File "/Users/pete/DWM/echo/echopype/echopype/calibrate/ek80_complex.py", line 130, in get_vend_filter_EK80
    v = vend.attrs[f"{channel_id} {filter_name} filter_r"] + 1j * np.array(
KeyError: 'EKA 266958-0F ES38D WBT filter_r'
print(ep.__version__)
0.6.4.dev86+g4ee4844
ctuguinay commented 7 months ago

Closed with the merging of #1302