OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
98 stars 73 forks source link

Errors when converting NWFSC Hake Survey Data from 2017 and 2021 #1374

Open ctuguinay opened 2 months ago

ctuguinay commented 2 months ago

Some errors I found while converting all of 2017 and 2021 files using the latest Echopype main branch:

Error converting noaa-wcsd-pds/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170912-T194552.raw with Exception: Short read while getting trailing raw file datagram size for check 4 != 0 @ (19315440L, 32080)
Error converting noaa-wcsd-pds/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170718-T121359.raw with Exception: Short read while getting trailing raw file datagram size for check 4 != 0 @ (1311056L, 540)
Error converting noaa-wcsd-pds/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170819-T060438.raw with Exception: Short read while getting dgram size 4 != 0 @ (1449574268L, 1458)
Error converting noaa-wcsd-pds/data/raw/Bell_M._Shimada/SH1707/EK60/Summer2017-D20170807-T171736.raw with Exception: The DType <class 'numpy.dtypes.DateTime64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype is `object`. The full list of DTypes is: (<class 'numpy.dtypes.DateTime64DType'>, <class 'numpy.dtypes.Float64DType'>)
Error converting noaa-wcsd-pds/data/raw/Bell_M._Shimada/SH2106/EK80/Hake-D20210913-T130612.raw with Exception: cannot reindex or align along dimension 'ping_time' because the (pandas) index has duplicate values
Error converting noaa-wcsd-pds/data/raw/Bell_M._Shimada/SH2106/EK80/Hake-D20210913-T225435.raw with Exception: cannot reindex or align along dimension 'ping_time' because the (pandas) index has duplicate values
ctuguinay commented 2 months ago

The short read error is probably the EK software abruptly ending the file write, but perhaps there's a way to retrieve the data that has been written properly instead of just losing it completely to this error.

ctuguinay commented 2 months ago

cannot reindex error stems from the fact that the ping times have duplicate values in them:

image

I think a simple drop_duplicates in the set groups stage would fix this problem: https://docs.xarray.dev/en/stable/generated/xarray.Dataset.drop_duplicates.html.

ctuguinay commented 2 months ago

For the The DType <class 'numpy.dtypes.DateTime64DType'> could not be promoted by <class 'numpy.dtypes.Float64DType'>. This means that no common DType exists for the given inputs. For example they cannot be stored in a single array unless the dtype isobject. The full list of DTypes is: (<class 'numpy.dtypes.DateTime64DType'>, <class 'numpy.dtypes.Float64DType'>) exception, we have the following:

image

where we are missing channel-specific environmental variable information.

The set groups then errors out here:

image

because the last ds_env is empty:

image

Broadcasting can be done onto the empty dataset in ds_env that can allow it to be merged:

image

Edit: Another simpler way to solve this is to remove sorted channels where the parser power is empty.