Possible dependency issues with updates to Pandas 2.0 and Shapely 2.0 etc.

I went through the updates and collected some changes that seem they may be relevant to STARE.

Dependency Updates

Name	Version
astropy	5.2.2
cartopy	0.21.1
gdal	3.6.3
geopandas	0.12.2
geopandas-base	0.12.2
geos	3.11.2
h5py	3.8.0
hdf4	4.2.15
hdf5	1.12.2
hdfeos2	2.20
matplotlib	3.7.1
numpy	1.24.2
pandas	2.0.0
proj	9.1.1
pygeos	0.14
pyhdf	0.10.5
pyproj	3.5.0
pyshp	2.3.1
pytest	7.3.1
python	3.11.3
shapely	2.0.1
STARE	Installs
pystare	0.8.12 STARE
staremaster	0.0.4 STARE
starepandas	0.6.6 STARE

Notes:

GeoPandas, PyGEOS and Shapely 2.0

GeoPandas has deprecated support for the PyGEOS backend in favor of Shapely 2.0 (which has merged with PyGEOS).
- Control this with the directive geopandas.options.use_pygeos = True/False
- Setting an environment variable USE_PYGEOS=0/1.
PyGEOS was merged with Shapely in December 2021 and has been released as part of Shapely 2.0.
Migrating to Shapely 2.0: This is a major release with a refactor of the internals with considerable performance improvements and with several breaking changes.
- Geometry objects have become immutable.
  - In-place changes to coordinates are no longer allowed.
  - Assigning custom attributes are no longer allowed.
- Multi-part geometries (MultiPoint, MultiLineString, MultiPolygon and GeometryCollection) are no longer be list-like 'sequences' (length, iterable, indexable).
  - So for some mp = MultiPoint object you can no longer use operations such as for part in mp:, mp[1], len(mp) or list(mp) etc.
  - Instead you have to use the geoms property of mp. For example, for part in mp.geoms:, mp.geoms[1], len(mp.geoms) or list(mp.geoms) etc.
- Interoperability with NumPy
  - Shapely provides an interface to access coordinates as NumPy arrays.
    - For example line = LineString then line_coords = np.asarray(line) or line_coords = np.array(line.coords)
- Consistent creation of empty geometries
  - Shapely nows consistently gives an empty geometry object of the correct type, instead of using an empty GeometryCollection as a generic empty geometry object.
- Deprecated Functionality
  - The empty() method on a geometry object is deprecated.
  - The shapely.ops.cascaded_union function is deprecated. Use shapely.ops.unary_union instead.

Pandas 2.0

Pandas 2.0 key changes.
- Pandas 2.0 has adopted Apache Arrow as a backend (columnar memory format) rather than using Numpy.
  - Arrow allows for vast improvements when operating on string-columns over Numpy.
    - Previously, columns with strings were case as having the object dtype as required by numpy.
    - Now, you can use dtype="string", dtype=pd.StringDtype() or .astype("string") to create a string-based column.
  - A PyArrow-backed column can be requested specifically by casting to or specifying a column's dtype as f"{dtype}[pyarrow]", e.g. "int64[pyarrow]"for an integer column.
    - Alternatively, a PyArrow dtype can be created dtype = pandas.ArrowDtype(pyarrow.int64)
  - Representation of "Missing values" (None)
    - Previously, pandas used numpy nan to represent missing values, but because nan are np.float64 any numeric column with missing values was converted to np.float64.
    - With Arrow, missing values can be represented with a python None, which preserves the column's data-type.
- Index can now hold numpy numeric dtypes.
  - This allows operations to create indexes with lower bit sizes (e.g. 16-bit indexes).
- Index set operations Index.union(), Index.intersection(), Index.difference(), and Index.symmetric_difference() now support sort=True.
- Copy-on-Write (CoW).
  - This is a way to deal with inconsistencies in pandas indexing operations.
    - A 'copy' of a DataFrame means that modifications to the parent or child DataFrame (the copy) are not shared.
    - A 'view' of a DataFrame means that modifications affect both the parent and child DataFrames.
  - Previously, some pandas operations returned a copy, while others returned a view.
    - This led to unwanted and difficult to detect side effects.
    - With CoW child DataFrame/Series always behaves as view (i.e, no extra memory usage, a lazy copy) until we modify either the parent/child, at which point the child is converted to a copy (deferred memory use).
    - This ensures that pandas DataFrames/Series can only be modified directly rather than inheriting changes via a view dependency.
      - Thus pandas now issues warnings/errors for inplace updates a view dependency.
    - This provides a significant performance improvement compared to the regular execution.
    - Thus, accessing a single column of a DataFrame as a Series (e.g. df["col"]) now always returns a new object.
  - Copy-on-Write can be enabled through pd.set_option("mode.copy_on_write", True) or pd.options.mode.copy_on_write = True
  - The inplace and copy keywords will eventually be deprecated and then removed.
- Non-nanosecond resolution in Timestamps.
  - date_range() and timedelta_range() now support a unit keyword ("s", "ms", "us", or "ns") to specify the desired resolution of the output index.
  - DatetimeIndex.as_unit() and TimedeltaIndex.as_unit() convert to different resolutions ("s", "ms", "us", or "ns").
- DataFrame.to_json() now supports a mode keyword with supported inputs 'w' and 'a'.
Backwards incompatible API changes.
- Construction with datetime64 or timedelta64 dtype with unsupported resolution.
  - Previously, Series or DataFrame constructed with a "datetime64" or "timedelta64" dtype with unsupported resolution (i.e. anything other than "ns", say dtype="datetime64[s]") results in a nanosecond dtype anyway dtype: datetime64[ns].
    - Now dtype="datetime64[s]" works as expected.
- UTC and fixed-offset timezones default to standard-library tzinfo objects
  - Previously, the default tzinfo object used to represent UTC was pytz.UTC.
    - Now pandas defaults to datetime.timezone.utc.
  - Similarly, for timezones represent fixed UTC offsets, we use datetime.timezone objects instead of pytz.FixedOffset objects.
- Empty DataFrames/Series will now default to have a RangeIndex
- Increased minimum versions for dependencies
  - pytest (dev) 7.0.0, python-dateutil 2.8.2, matplotlib 3.6.1, xarray 0.21.0
- Datetimes are now parsed with a consistent format
  - In the past, to_datetime() guessed the format for each element independently.
  - Now parsing will use a consistent format, determined by the first non-NA value (unless the user specifies a format, in which case that is used).
- Pandas uses SQLAlchemy, which has also undergone a major update (version 2.0+), as a consquence the previous pandas syntax for performing SQL IO, particularly in DataFrame.to_sql and pd.read_sql (via pd.read_sql_query and pd.read_sql_table) is no longer compatible with the new SQLAlchemy syntax.
  - The upgrade to SQLAlchemy 2.0+ syntax is not backwards compatible.
  - Migrating to SQLAlchemy 2.0
Other Changes
- Installing optional dependencies with pip extras.
  - When installing pandas using pip, sets of optional dependencies can also be installed by specifying extras.
  - pip install "pandas[performance, aws]>=2.0.0"
  - The available extras, found in the installation guide, are [all, performance, computation, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml, plot, output_formatting, clipboard, compression, test].

SpatioTemporal / STAREPandas

Possible dependency issues with updates to Pandas 2.0 and Shapely 2.0 etc. #141

Dependency Updates

Notes:

GeoPandas, PyGEOS and Shapely 2.0

Pandas 2.0