Unidata / MetPy

MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
https://unidata.github.io/MetPy/
BSD 3-Clause "New" or "Revised" License
1.25k stars 416 forks source link

Timezone Aware Datetime with Python>=3.12 #3298

Open kgoebber opened 11 months ago

kgoebber commented 11 months ago

What went wrong?

In Python 3.12 the datetime.utcnow() functionality is removed. The solution is to move to timezone aware datetime objects with

import datetime as datetime, UTC

date = datetime.now(UTC)

The issue that comes up is that our data that we bring in, whether surface, upper air, or gridded have traditionally not been timezone aware. There are many instances of setting a datetime, whether with utcnow() or with a manual setting (e.g., datetime(2017, 3, 8, 12) that are/were timezone naive. The issue comes in when comparing timezone aware to timezone naive values as that comparison cannot be done. This is most likely to come up when subsetting data for a certain time or time window.

This issue captures our need to address this with the challenge being in determining how to address this change holistically. Some changes may need to be in MetPy, while others lie upstream with other packages or might require code examples to highlight how to get the proper object type for users.

Operating System

MacOS

Version

1.5.1

Python Version

3.12

Code to Reproduce

from datetime import datetime, UTC
import xarray as xr
from metpy.units import units

date = datetime(2023, 11, 30, 12, tzinfo=UTC)

# Get GFS data for contouring
ds = xr.open_dataset('https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/'
                     f'Global_onedeg_ana/GFS_Global_onedeg_ana_{date:%Y%m%d_%H%M}.grib2')

ds.Geopotential_height_isobaric.metpy.sel(time=date, vertical=850*units.hPa)

Errors, Traceback, and Logs

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File ~/miniconda3/envs/main/lib/python3.11/site-packages/pandas/core/indexes/datetimes.py:579, in DatetimeIndex._disallow_mismatched_indexing(self, key)
    577 try:
    578     # GH#36148
--> 579     self._data._assert_tzawareness_compat(key)
    580 except TypeError as err:

File ~/miniconda3/envs/main/lib/python3.11/site-packages/pandas/core/arrays/datetimes.py:770, in DatetimeArray._assert_tzawareness_compat(self, other)
    769     if other_tz is not None:
--> 770         raise TypeError(
    771             "Cannot compare tz-naive and tz-aware datetime-like objects."
    772         )
    773 elif other_tz is None:

TypeError: Cannot compare tz-naive and tz-aware datetime-like objects.

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File ~/miniconda3/envs/main/lib/python3.11/site-packages/xarray/core/indexes.py:769, in PandasIndex.sel(self, labels, method, tolerance)
    768 try:
--> 769     indexer = self.index.get_loc(label_value)
    770 except KeyError as e:

File ~/miniconda3/envs/main/lib/python3.11/site-packages/pandas/core/indexes/datetimes.py:599, in DatetimeIndex.get_loc(self, key)
    597 if isinstance(key, self._data._recognized_scalars):
    598     # needed to localize naive datetimes
--> 599     self._disallow_mismatched_indexing(key)
    600     key = Timestamp(key)

File ~/miniconda3/envs/main/lib/python3.11/site-packages/pandas/core/indexes/datetimes.py:581, in DatetimeIndex._disallow_mismatched_indexing(self, key)
    580 except TypeError as err:
--> 581     raise KeyError(key) from err

KeyError: datetime.datetime(2023, 11, 30, 12, 0, tzinfo=datetime.timezone.utc)

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
Cell In[56], line 11
      7 # Get GFS data for contouring
      8 ds = xr.open_dataset('https://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/'
      9                      f'Global_onedeg_ana/GFS_Global_onedeg_ana_{date:%Y%m%d_%H%M}.grib2')
---> 11 ds.Geopotential_height_isobaric.metpy.sel(time=date, vertical=850*units.hPa)

File ~/miniconda3/envs/main/lib/python3.11/site-packages/metpy/xarray.py:644, in MetPyDataArrayAccessor.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
    642 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, 'sel')
    643 indexers = _reassign_quantity_indexer(self._data_array, indexers)
--> 644 return self._data_array.sel(indexers, method=method, tolerance=tolerance, drop=drop)

File ~/miniconda3/envs/main/lib/python3.11/site-packages/xarray/core/dataarray.py:1582, in DataArray.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1472 def sel(
   1473     self: T_DataArray,
   1474     indexers: Mapping[Any, Any] | None = None,
   (...)
   1478     **indexers_kwargs: Any,
   1479 ) -> T_DataArray:
   1480     """Return a new DataArray whose data is given by selecting index
   1481     labels along the specified dimension(s).
   1482 
   (...)
   1580     Dimensions without coordinates: points
   1581     """
-> 1582     ds = self._to_temp_dataset().sel(
   1583         indexers=indexers,
   1584         drop=drop,
   1585         method=method,
   1586         tolerance=tolerance,
   1587         **indexers_kwargs,
   1588     )
   1589     return self._from_temp_dataset(ds)

File ~/miniconda3/envs/main/lib/python3.11/site-packages/xarray/core/dataset.py:3020, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   2959 """Returns a new dataset with each array indexed by tick labels
   2960 along the specified dimension(s).
   2961 
   (...)
   3017 DataArray.sel
   3018 """
   3019 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 3020 query_results = map_index_queries(
   3021     self, indexers=indexers, method=method, tolerance=tolerance
   3022 )
   3024 if drop:
   3025     no_scalar_variables = {}

File ~/miniconda3/envs/main/lib/python3.11/site-packages/xarray/core/indexing.py:190, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
    188         results.append(IndexSelResult(labels))
    189     else:
--> 190         results.append(index.sel(labels, **options))
    192 merged = merge_sel_results(results)
    194 # drop dimension coordinates found in dimension indexers
    195 # (also drop multi-index if any)
    196 # (.sel() already ensures alignment)

File ~/miniconda3/envs/main/lib/python3.11/site-packages/xarray/core/indexes.py:771, in PandasIndex.sel(self, labels, method, tolerance)
    769                 indexer = self.index.get_loc(label_value)
    770             except KeyError as e:
--> 771                 raise KeyError(
    772                     f"not all values found in index {coord_name!r}. "
    773                     "Try setting the `method` keyword argument (example: method='nearest')."
    774                 ) from e
    776 elif label_array.dtype.kind == "b":
    777     indexer = label_array

KeyError: "not all values found in index 'time'. Try setting the `method` keyword argument (example: method='nearest')."
dopplershift commented 11 months ago

So the whole situation is kind of a mess. Pandas apparent does actually support timezones but that gets stored under its own Dtype; numpy datetime64 is defined to be naive; and xarray is caught in the middle as well. There is probably a role we could potentially play trying to contribute, but I personally do not have a great handle on all the potential pitfalls that are at play--I only know enough to know that dates & times & timezones are a huge pain.

I'm not even sure how you could open a netCDF file and make the time axis set the timezone to UTC, but it seems to be possible?

Given that, for MetPy I'm coming around to the idea that while we should address the warnings and avoid deprecated methods (#3255), we should keep using tz-naive datetime instances internally. So e.g.:

datetime.utcnow()

becomes

datetime.now(timezone.utc).replace(tzinfo=None)

That feels ugly to drop timezone information (which really is nice to have for robust labelling, etc.), but would preserve the existing behavior and avoid a bunch of headaches for our users until the rest of the ecosystem is more ready to deal with them.

dopplershift commented 10 months ago

So for completeness, #3255 implements this using a sensible mix:

Is there anything more we need to do in MetPy itself about this?