OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
96 stars 73 forks source link

[Data conversion] `open_raw` giving error when trying to open ES80 data #1279

Closed dsmossman closed 5 months ago

dsmossman commented 6 months ago

General description of problem

I am trying to use echopype to examine and process some ES80 data (which contains a combination of narrowband and wideband data), but running into an error when trying to import it using open_raw.

Echopype version

echopype v0.8.3

How did you install Echopype (e.g., conda or pip)

conda

What is your operating system

Windows 10

Minimal code example

Python code to reproduce the error:

import echopype as ep

file = 'C:/Users/dmossman/Downloads/C26-D20230708-T104705.raw'
ep.open_raw(raw_file=file, sonar_model="ES80")```

### Error message printouts

```shell
Traceback (most recent call last):
  File "C:\Users\dmossman\AppData\Local\Programs\Python\Python312\Lib\pathlib.py", line 555, in drive
    return self._drv
           ^^^^^^^^^
AttributeError: 'NodePath' object has no attribute '_drv'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "C:\Program Files\JetBrains\PyCharm Community Edition 2023.3.4\plugins\python-ce\helpers\pydev\pydevconsole.py", line 364, in runcode
    coro = func()
           ^^^^^^
  File "<input>", line 1, in <module>
  File "C:\Users\dmossman\Box\Glider Data\SIMRAD_ES80\ES80_Python_Processing\.venv\Lib\site-packages\echopype\utils\prov.py", line 237, in inner
    dataobj = func(*args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dmossman\Box\Glider Data\SIMRAD_ES80\ES80_Python_Processing\.venv\Lib\site-packages\echopype\convert\api.py", line 491, in open_raw
    tree = DataTree.from_dict(tree_dict, name="root")
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\dmossman\Box\Glider Data\SIMRAD_ES80\ES80_Python_Processing\.venv\Lib\site-packages\datatree\datatree.py", line 360, in from_dict
    node_name = NodePath(path).name
                ^^^^^^^^^^^^^^
  File "C:\Users\dmossman\Box\Glider Data\SIMRAD_ES80\ES80_Python_Processing\.venv\Lib\site-packages\datatree\treenode.py", line 34, in __new__
    if obj.drive:
       ^^^^^^^^^
  File "C:\Users\dmossman\AppData\Local\Programs\Python\Python312\Lib\pathlib.py", line 557, in drive
    self._load_parts()
  File "C:\Users\dmossman\AppData\Local\Programs\Python\Python312\Lib\pathlib.py", line 408, in _load_parts
    paths = self._raw_paths
            ^^^^^^^^^^^^^^^
AttributeError: 'NodePath' object has no attribute '_raw_paths'

Example data

Here is a link to an example .raw file from the ES80 echosounder: C26-D20230708-T104705.raw

Related existing issues or PRs

I have not seen any related issues/PRs.

Troubleshooting steps

Stepping through the functions, the error appears to arise when open_raw tries to make a DataTree for the future EchoData object. I can't really tell much more than that.

leewujung commented 6 months ago

Hey @dsmossman : Thanks for reporting this! What is the datatree version you have? In the requirements we pinned it at a pretty old version xarray-datatree==0.0.6 so I wonder if it is a conflict thing with a newer version (the latest is 0.0.14).

dsmossman commented 6 months ago

Thank you for getting back to me so quickly! Just double checked and I do have version 0.0.6 of xarray-datatree. To be safe, I copied the contents of requirements.txt into my environment file and recreated the environment, and it is still giving me the same error.

I probably should have mentioned this in the initial post, but as well as the errors, echopype is also giving me some warnings before it stops executing:

C:\Users\dmossman\mambaforge\envs\ES80_Python_Processing\Lib\site-packages\echopype\convert\parse_base.py:623: ComplexWarning: Casting complex values to real discards the imaginary part
  out_array[mask] = np.concatenate(data_list).reshape(-1)  # reshape in case data > 1D
C:\Users\dmossman\mambaforge\envs\ES80_Python_Processing\Lib\site-packages\echopype\utils\coding.py:87: UserWarning: Times can't be serialized faithfully to int64 with requested units 'seconds since 1900-01-01T00:00:00+00:00'. Resolution of 'microseconds' needed. Serializing times to floating point instead. Set encoding['dtype'] to integer dtype to serialize to int64. Set encoding['dtype'] to floating point dtype to silence this warning.
  encoded_data, _, _ = coding.times.encode_cf_datetime(
C:\Users\dmossman\mambaforge\envs\ES80_Python_Processing\Lib\site-packages\echopype\utils\coding.py:87: UserWarning: Times can't be serialized faithfully to int64 with requested units 'seconds since 1900-01-01T00:00:00+00:00'. Resolution of 'microseconds' needed. Serializing times to floating point instead. Set encoding['dtype'] to integer dtype to serialize to int64. Set encoding['dtype'] to floating point dtype to silence this warning.
  encoded_data, _, _ = coding.times.encode_cf_datetime(
C:\Users\dmossman\mambaforge\envs\ES80_Python_Processing\Lib\site-packages\echopype\utils\coding.py:87: UserWarning: Times can't be serialized faithfully to int64 with requested units 'seconds since 1900-01-01T00:00:00+00:00'. Resolution of 'microseconds' needed. Serializing times to floating point instead. Set encoding['dtype'] to integer dtype to serialize to int64. Set encoding['dtype'] to floating point dtype to silence this warning.
  encoded_data, _, _ = coding.times.encode_cf_datetime(
C:\Users\dmossman\mambaforge\envs\ES80_Python_Processing\Lib\site-packages\echopype\utils\coding.py:87: UserWarning: Times can't be serialized faithfully to int64 with requested units 'seconds since 1900-01-01T00:00:00+00:00'. Resolution of 'microseconds' needed. Serializing times to floating point instead. Set encoding['dtype'] to integer dtype to serialize to int64. Set encoding['dtype'] to floating point dtype to silence this warning.
  encoded_data, _, _ = coding.times.encode_cf_datetime(
C:\Users\dmossman\mambaforge\envs\ES80_Python_Processing\Lib\site-packages\echopype\utils\coding.py:87: UserWarning: Times can't be serialized faithfully to int64 with requested units 'seconds since 1900-01-01T00:00:00+00:00'. Resolution of 'microseconds' needed. Serializing times to floating point instead. Set encoding['dtype'] to integer dtype to serialize to int64. Set encoding['dtype'] to floating point dtype to silence this warning.
  encoded_data, _, _ = coding.times.encode_cf_datetime(
C:\Users\dmossman\mambaforge\envs\ES80_Python_Processing\Lib\site-packages\echopype\utils\coding.py:87: UserWarning: Times can't be serialized faithfully to int64 with requested units 'seconds since 1900-01-01T00:00:00+00:00'. Resolution of 'microseconds' needed. Serializing times to floating point instead. Set encoding['dtype'] to integer dtype to serialize to int64. Set encoding['dtype'] to floating point dtype to silence this warning.
  encoded_data, _, _ = coding.times.encode_cf_datetime(
C:\Users\dmossman\mambaforge\envs\ES80_Python_Processing\Lib\site-packages\echopype\utils\coding.py:87: UserWarning: Times can't be serialized faithfully to int64 with requested units 'seconds since 1900-01-01T00:00:00+00:00'. Resolution of 'nanoseconds' needed. Serializing times to floating point instead. Set encoding['dtype'] to integer dtype to serialize to int64. Set encoding['dtype'] to floating point dtype to silence this warning.
  encoded_data, _, _ = coding.times.encode_cf_datetime(

I was not sure if they were relevant to the error, since they didn't seem related, but maybe they are?

ctuguinay commented 5 months ago

I haven't taken a look at your actual error itself yet, but the cf datetime issue is related to #1290

ctuguinay commented 5 months ago

@dsmossman This is a strange error...I wasn't able to replicate it on Windows or Ubuntu and the open_raw ran fine. Do you have the same 2 errors pop up when you try to open_raw a non-ES80 raw file, like an EK60 raw file? In case you need it, here's an example notebook for downloading EK60 data from NOAA's S3 bucket: Example Notebook.

Also, what version of Pathlib do you have in your Python 3.12 library? I couldn't find the line for which that Pathlib error was called in my own Pathlib file.

dsmossman commented 5 months ago

@ctuguinay I think it's the pathlib version that comes with python 3.12.2; here is a screenshot:

image

Downloading a .raw file from the archive and trying to run it within my ES80 environment gives the same error. If I follow the steps to create a new environment as outlined in the echopype Tour notebook, I get the same UserWarning I copied above, but open_raw works for both EK60 and ES80 files.

The only thing I can think of is some sort of virtual environment issue. My colleague (Lori Garzio) and I both created the environment with the .yml file through the terminal. Maybe something went wrong there? I will investigate further and keep you posted.

ctuguinay commented 5 months ago

@dsmossman Hi any updates on this? I wonder if there are conflicting libraries in your virtual env 🤔. The binder environment in echopype examples is rather small, so it has less risk of conflicting libraries.

Also, the time serialization user warning should be fixed with the merging of #1299.

dsmossman commented 5 months ago

@ctuguinay yes, I finally figured it out! My first try at an environment was using a 3.12 release of Python; when I switched to a 3.10 release (as specified in the .yml file for the EK60 example stuff), the error went away. I also tested release 3.11 of Python and it works as well. I have no idea what they did in the 3.12 release of Python that broke pathlib for me, but I'm glad to have found a solution. Thanks for your help!

ctuguinay commented 5 months ago

Ah okay, that makes sense. Thanks for clearing that up! Our current CI tests are for Python 3.9, 3.10, and 3.11, but it's good to know that there are possible issues users may experience if they create an env using 3.12. When we start testing for 3.12, we'll keep this issue in mind. I'll close this issue now.