CartoDB / cartoframes

CARTO Python package for data scientists
BSD 3-Clause "New" or "Revised" License
250 stars 63 forks source link

GeoSeries.touches(other) and GeoSeries.intersects(other) unexpected behavior #449

Closed MichaelSpichiger closed 6 years ago

MichaelSpichiger commented 6 years ago

So using these two datasets, we ran into some unexpected behavior of the intersect and touches functions.

assigned_bins = cc.read('assigned_hexbins')
unassigned_bins = cc.read('unassigned_hexbins')

assigned_hexbins.csv.zip unassigned_hexbins.csv.zip

We converted the dataframes into geodataframes

unassigned_bins =cc.query('SELECT *, ST_AsText(the_geom) as geom FROM mspichigercarto.unassigned_hexbins').reset_index()
assigned_bins =cc.query('SELECT *, ST_AsText(the_geom) as geom FROM mspichigercarto.assigned_hexbins').reset_index()

import shapely.wkt

geometry = unassigned_bins['geom'].map(shapely.wkt.loads)
unassigned_bins = unassigned_bins.drop('geom', axis=1)
crs = {'init': 'epsg:4326'}
unassigned_bins_geo = gpd.GeoDataFrame(unassigned_bins, crs=crs, geometry=geometry)`
import shapely.wkt

geometry = assigned_bins['geom'].map(shapely.wkt.loads)
assigned_bins = assigned_bins.drop('geom', axis=1)
crs = {'init': 'epsg:4326'}
assigned_bins_geo = gpd.GeoDataFrame(assigned_bins, crs=crs, geometry=geometry)

Then we tried running touches and intersects on them, but instead of returning expected values it would return false for everything and none of the geometries intersected.

assigned_bins_geo.touches(unassigned_bins_geo)

We then ran this query in cartoframes

st_touch_query = '''
SELECT assigned_hexbins.*,
      unassigned_hexbins.cartodb_id as index_unassigned
FROM assigned_hexbins, unassigned_hexbins
WHERE ST_Touches(assigned_hexbins.the_geom, unassigned_hexbins.the_geom)
'''

according to this query, the hexbins are touching so touches and intersects are not working as expected.

andy-esch commented 6 years ago

Hey @MichaelSpichiger thanks for finding this.

Could you do a couple of things to debug:

  1. Download the data as a shapefiles or GeoJSONs directly and load it with GeoPandas' .read_file method (not using cartoframes)
  2. Try out the touches code to see if you get the same as above.

If you get the same result with that process, it's not a cartoframes bug. Looking forward to seeing the results :)

ljwolf commented 6 years ago

This might be due to coordinate drift from moving from wkt to geometries instead of straight from the wkb. Try to examine whether your anticipated outcome is obtained by parsing the wkb directly.

When I swap to defining the geometry column using the wkb in the the_geom column:


import binascii as ba
import shapely.wkb as wkb
import geopandas as gpd
unassigned['geometry'] = unassigned.the_geom.apply(lambda string: wkb.loads(ba.unhexlify(string))
assigned['geometry'] = assigned.the_geom.apply(lambda string: wkb.loads(ba.unhexlify(string))

unassigned = gpd.GeoDataFrame(unassigned)
assigned = gpd.GeoDataFrame(assigned)

unassigned.touches(assigned).any() # has trues

intersection = gpd.tools.sjoin(unassigned, assigned, op='intersects', how='left')
intersection.shape # (has 293 rows)
michellemho commented 6 years ago

Thanks Levi!

Do you know the difference between unassigned.intersects(assigned) and gpd.tools.sjoin(unassigned, assigned, op='intersects', how='left')?

I get different results for each...

michellemho commented 6 years ago

I wonder if it has something to do with: https://github.com/geopandas/geopandas/issues/720?

ljwolf commented 6 years ago

good eye, looks like it.

I figured it was doing some kind of index matching, since unassigned.touches(assigned) is the same shape as assigned.touches(unassighed), but I think the proper operation @MichaelSpichiger intends from their SQL is the sjoin, not a table touches.

andy-esch commented 6 years ago

Hey all, how's this going? Overall it looks like it doesn't have to do with cartoframes

michellemho commented 6 years ago

Yes, it doesn't seem to be related to cartoframes. It has to do with Geopandas intersection methods.

On Tue, Jun 19, 2018 at 9:58 AM Andy Eschbacher notifications@github.com wrote:

Hey all, how's this going? Overall it looks like it doesn't have to do with cartoframes

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CartoDB/cartoframes/issues/449#issuecomment-398408594, or mute the thread https://github.com/notifications/unsubscribe-auth/AL-ECYlWt9REZNTieNA-Du3YqaFA4stkks5t-QOTgaJpZM4UjUwr .

-- Michelle Ho Data Scientist mho@carto.com

ljwolf commented 6 years ago

Also with coordinate drift in reading/writing wkt vs. wkb.

andy-esch commented 6 years ago

Closing this in favor of the geopandas one