Esri / arcgis-python-api

Documentation and samples for ArcGIS API for Python
https://developers.arcgis.com/python/
Apache License 2.0
1.88k stars 1.1k forks source link

Cannot spatially join self-intersecting Polygons with Shapely #1433

Closed GeeFernando closed 1 year ago

GeeFernando commented 1 year ago

Describe the bug Not all layers are perfect 😅 Especially, those layers that are drawn quickly due to time pressures (e.g. US wild-fire polygons). I've found it impossible to spatially join these layers to State boundaries or Census tracts using the ArcGIS Online Notebook's Standard runtime (and Spatially Enabled Data-frames). But the same code works fine in the Advanced runtime or on a machine that has ArcPy installed.

For this example I created a self-intersecting polygon layer (to mimic a typical wild-fire layer) - https://arcgis.com/home/item.html?id=ddf439207b444843aeb29969276f2728

To Reproduce Steps to reproduce the behavior:

from arcgis.gis import GIS
gis = GIS("home")

import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor

us_States = gis.content.get("8c2d6d7df8fa4142b0a1211c8dd66903").layers[0].query(as_df = True, out_sr = 102005)

self_Intersecting_Polygons = gis.content.get("ddf439207b444843aeb29969276f2728").layers[0].query(as_df = True, out_sr = 102005)

state_intersecting = us_States.spatial.join(self_Intersecting_Polygons, how='inner', op='intersects')
state_intersecting

error:

---------------------------------------------------------------------------
PredicateError                            Traceback (most recent call last)
File /opt/conda/lib/python3.9/site-packages/shapely/predicates.py:15, in BinaryPredicate.__call__(self, this, other, *args)
     14 try:
---> 15     return self.fn(this._geom, other._geom, *args)
     16 except PredicateError as err:
     17     # Dig deeper into causes of errors.

File /opt/conda/lib/python3.9/site-packages/shapely/geos.py:609, in errcheck_predicate(result, func, argtuple)
    608 if result == 2:
--> 609     raise PredicateError("Failed to evaluate %s" % repr(func))
    610 return result

PredicateError: Failed to evaluate <_FuncPtr object at 0x7feffc292400>

During handling of the above exception, another exception occurred:

TopologicalError                          Traceback (most recent call last)
Input In [19], in <cell line: 11>()
      7 us_States = gis.content.get("8c2d6d7df8fa4142b0a1211c8dd66903").layers[0].query(as_df = True, out_sr = 102005)
      9 self_Intersecting_Polygons = gis.content.get("ddf439207b444843aeb29969276f2728").layers[0].query(as_df = True, out_sr = 102005)
---> 11 state_intersecting = us_States.spatial.join(self_Intersecting_Polygons, how='inner', op='intersects')
     12 state_intersecting

File /opt/conda/lib/python3.9/site-packages/arcgis/features/geo/_accessor.py:1716, in GeoAccessor.join(self, right_df, how, op, left_tag, right_tag)
   1703 predicate_d = {
   1704     "intersects": find_intersects,
   1705     "contains": find_contains,
   1706     "within": find_contains,
   1707 }
   1709 check_predicates = np.vectorize(predicate_d[op])
   1711 result = pd.DataFrame(
   1712     np.column_stack(
   1713         [
   1714             l_idx,
   1715             r_idx,
-> 1716             check_predicates(
   1717                 left_df[self.name].apply(lambda x: x)[l_idx],
   1718                 right_df[right_df.spatial._name][r_idx],
   1719             ),
   1720         ]
   1721     )
   1722 )
   1724 result.columns = ["_key_left", "_key_right", "match_bool"]
   1725 result = pd.DataFrame(result[result["match_bool"] == 1]).drop(
   1726     "match_bool", axis=1
   1727 )

File /opt/conda/lib/python3.9/site-packages/numpy/lib/function_base.py:2304, in vectorize.__call__(self, *args, **kwargs)
   2301     vargs = [args[_i] for _i in inds]
   2302     vargs.extend([kwargs[_n] for _n in names])
-> 2304 return self._vectorize_call(func=func, args=vargs)

File /opt/conda/lib/python3.9/site-packages/numpy/lib/function_base.py:2387, in vectorize._vectorize_call(self, func, args)
   2384 # Convert args to object arrays first
   2385 inputs = [asanyarray(a, dtype=object) for a in args]
-> 2387 outputs = ufunc(*inputs)
   2389 if ufunc.nout == 1:
   2390     res = asanyarray(outputs, dtype=otypes[0])

File /opt/conda/lib/python3.9/site-packages/arcgis/features/geo/_accessor.py:1698, in GeoAccessor.join.<locals>.find_intersects(a1, a2)
   1697 def find_intersects(a1, a2):
-> 1698     return a1.disjoint(a2) == False

File /opt/conda/lib/python3.9/site-packages/arcgis/geometry/_types.py:2144, in Geometry.disjoint(self, second_geometry)
   2142     if isinstance(second_geometry, Geometry):
   2143         second_geometry = second_geometry.as_shapely
-> 2144     return self.as_shapely.disjoint(second_geometry)
   2145 return None

File /opt/conda/lib/python3.9/site-packages/shapely/geometry/base.py:776, in BaseGeometry.disjoint(self, other)
    774 def disjoint(self, other):
    775     """Returns True if geometries are disjoint, else False"""
--> 776     return bool(self.impl['disjoint'](self, other))

File /opt/conda/lib/python3.9/site-packages/shapely/predicates.py:18, in BinaryPredicate.__call__(self, this, other, *args)
     15     return self.fn(this._geom, other._geom, *args)
     16 except PredicateError as err:
     17     # Dig deeper into causes of errors.
---> 18     self._check_topology(err, this, other)

File /opt/conda/lib/python3.9/site-packages/shapely/topology.py:38, in Delegating._check_topology(self, err, *geoms)
     36 for geom in geoms:
     37     if not geom.is_valid:
---> 38         raise TopologicalError(
     39             "The operation '%s' could not be performed. "
     40             "Likely cause is invalidity of the geometry %s" % (
     41                 self.fn.__name__, repr(geom)))
     42 raise err

TopologicalError: The operation 'GEOSDisjoint_r' could not be performed. Likely cause is invalidity of the geometry <shapely.geometry.multipolygon.MultiPolygon object at 0x7fef75e5be50>

The error states it is due to Multipolygons. But, I can confirm it's actually due to self intersecting features (and not multipolygns). For example - the following script works in the Standard runtime - despite states such as California being mulipolygons.

from arcgis.gis import GIS
gis = GIS("home")

import pandas as pd
from arcgis.features import GeoAccessor, GeoSeriesAccessor

us_States = gis.content.get("8c2d6d7df8fa4142b0a1211c8dd66903").layers[0].query(as_df = True, out_sr = 102005)

time_Zones = gis.content.get("312cebfea2624e108e234220b04460b8").layers[0].query(as_df = True, out_sr = 102005)

state_Timezones = us_States.spatial.join(time_Zones, how='inner', op='intersects')
state_Timezones

My assumption is it is something to do with whether arcpy or shapely is being used for Spatially Joining. It'll be great to get a fix for Mac users and AGOL Standard Runtime users.

Thanks so much, Gee

Expected behavior Same output as the Advanced runtime (ArcPy).

Platform (please complete the following information):

Additional context This bug occurs in AGOL Standard runtime and MacOS with the above package versions.

nanaeaubry commented 1 year ago

Hello we will take a look at this

GeeFernando commented 1 year ago

Hello we will take a look at this

Thanks so much 🙏

nanaeaubry commented 1 year ago

@GeeFernando We have been looking at this and it seems to be an issue with Shapely itself. I don't think it is adept to handling self-intersecting polygons. When done through arcpy there is no issue since that package handles this case.

We will look at what can be done on our end and post any updates here.

GeeFernando commented 1 year ago

@GeeFernando We have been looking at this and it seems to be an issue with Shapely itself. I don't think it is adept to handling self-intersecting polygons. When done through arcpy there is no issue since that package handles this case.

We will look at what can be done on our end and post any updates here.

@nanaeaubry Thanks so much for looking into this 🙏

Also - is there a way to identify which features are self intersecting? If I'm able to do that, I could potentially fix those features (if I have edit access to them). I already tried https://developers.arcgis.com/python/api-reference/arcgis.features.toc.html#arcgis.features.GeoSeriesAccessor.is_valid and https://developers.arcgis.com/python/api-reference/arcgis.features.html#arcgis.features.GeoAccessor.validate - but self-intersecting are determined to be valid when using these functions.

Thanks again for the help 🙏

nanaeaubry commented 1 year ago

You can pass a geometry filter to a feature layer query to find where the intersections occur.

Take a look at this, there is some example code below the table: https://developers.arcgis.com/python/api-reference/arcgis.geometry.filters.html#intersects

Otherwise you can try within : https://developers.arcgis.com/python/api-reference/arcgis.features.html#arcgis.features.GeoSeriesAccessor.within

Hope this can help 👍🏻 I am going to close the issue since we cannot fix but if you have more feel free to post.