OSOceanAcoustics / echopype

Enabling interoperability and scalability in ocean sonar data analysis
https://echopype.readthedocs.io/
Apache License 2.0
94 stars 73 forks source link

xarray warnings when EK60/80 `Platform` `time1` contains a `NaT` value #1153

Closed emiliom closed 12 months ago

emiliom commented 12 months ago

While running the echopype-examples notebooks, I noticed three warnings that probably have no known impacts, but can be of concern to users who see them:

  1. ... site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
  2. ... site-packages/xarray/coding/times.py:618: RuntimeWarning: invalid value encountered in cast int_num = np.asarray(num, dtype=np.int64)
  3. ... site-packages/xarray/core/duck_array_ops.py:188: RuntimeWarning: invalid value encountered in cast return data.astype(dtype, **kwargs)

In notebook 3 (OOI data), they're happening with open_raw and to_zarr (all 3 warnings, but especially 1 & 2) with every one of the 19 raw files that are handled, and with open_converted (warning 1 only) with only 3 of the converted files. In notebook 2 (hake data) it's harder to isolate, but I'm seeing only warning 3, and they only happen 3 times, which I assume means 3 files out of the total 170 raw files that are handled. In notebook 1, handles only 5 raw files, none of them occur.

This PR shows the notebooks with their warnings.

After some digging with a couple of the OOI files, it's clear that the source of all 3 warnings is the presence of a Platform time1 coordinate that has a NaT value (actually, it is length one and its value is NaT). It looks to me like xarray's behavior has changed (ie, a previous version was not emitting these warnings), b/c I'm testing the single file that's examined initially in echopype-examples notebook 3. On the current version online, Platform time1 does have a single NaT value, but the warning was not emitted. See here. Now, the open_raw statement that creates the EchoData object emits the warning (see PR #37); to_zarr and to_netcdf on that file emit the other warnings.

Note that all these warnings were observed with EK60 files. They, or at least some of them, likely occur with EK80 too. For AZFP, at least warnings 1 and 2 probably don't occur (see below).

emiliom commented 12 months ago

I'd like to propose a solution. With EK's, effectively, when lat & lon data are not present (as with the OOI data), a Platform time1 coordinate is created that has a single NaT value. That's a choice made in the past, implicitly or explicitly. In contrast, with AZFP, which rarely, if ever, have valid lat & lon data that would result in valid Platform time1 values, the value that's added to time1 is the first timestamp value (currently taken from time2).

SO: how about doing the same thing as with AZFP?? That would get rid of the NaT's, and therefore the warnings. And it'd bring consistency across instruments.

leewujung commented 12 months ago

SO: how about doing the same thing as with AZFP?? That would get rid of the NaT's, and therefore the warnings. And it'd bring consistency across instruments.

To make sure I understand, you meant that the coordinate time1 will have a single element with value being the first ping_time of the EK dataset, but will just have NaN in the actual data variables?

emiliom commented 12 months ago

Correct. The actual data variables already have NaN.

emiliom commented 12 months ago

I'll submit a PR very soon that implements the approach I described above. As it turns out, it addresses the two warnings (warnings 1 & 2) that involve xarray/coding/times.py, but not warning 3, which involves xarray/core/duck_array_ops.py. Fortunately warnings 1 & 2 are the most common ones!

Warning 3 (at least with a test OOI file) is triggered by utils/coding.py:sanitize_dtypes on open_raw, here:

https://github.com/OSOceanAcoustics/echopype/blob/5abca87743ffeb3f5fba0a40f7989ef54c9d1d68/echopype/utils/coding.py#L69-L70

But I haven't gotten further in identifying the exact variable(s) and condition that lead to it.

emiliom commented 12 months ago

The core of this issue was addressed in PR #1154. I'll close this issue and open a new, narrower one focused on the remaining warning.