CartoDB / observatory-extension

BSD 3-Clause "New" or "Revised" License
6 stars 4 forks source link

OBS_GetMeasure() fails on too many input geometries #160

Closed talos closed 8 years ago

talos commented 8 years ago

It's very common for OBS_GetMeasure to not work with some inputs. We need to figure out performant ways to fix these geoms before we work with them.

talos commented 8 years ago

@saleiva @makella if you find a failure for "Total Population" on a dataset, please link to it here -- if possible, including the error message, so I can track down which geometry is causing the issue.

I'll turn these into test cases so the function behaves better with all geometries.

makella commented 8 years ago

@talos link to the polygons i was trying to attach total_pop to: https://team.carto.com/u/mamataakella/tables/tree_canopy_assessment_2013_land_use_copy

if it happens again, i'll post the link to the data here, no problem.

talos commented 8 years ago

Gist with a bad geometry, courtesy @javisantana : https://gist.github.com/talos/9644ce0db00941b327195d10410f7014

An example error courtesy @saleiva, with fragment of bad geometry:

7 "cdb_dataservices_client._obs_getmeasure(7): [cartodb_user_a6f0d0fe-4f4f-4217-8150-22b0010fe409_db] REMOTE ERROR: plpy.Error: There was an error trying to use OBS_GetMeasure: cdb_dataservices_server._obs_getmeasure(7): [obs_2016_07_18_1d86d3f2ec] REMOTE ERROR: Error performing intersection: TopologyException: found non-noded intersection between LINESTRING (-77.1134 38.9255, -77.1134 38.9255) and LINESTRING (-77.1134 38.9255, -77.1134 38.9255) at -77.113423945991215 38.925486167887797"

And problem dataset courtesy @saleiva : https://team.carto.com/u/saleiva/dataset/dc_postal_codes

talos commented 8 years ago

In mamata's dataset, cartodb_id 3000 is the finicky one. In saleiva's, it's 132.

talos commented 8 years ago

We're still having issues with these apart from the selected boundary size. For example, theses Florida zips should be measured using block groups, but without a larger buffer than what we're currently using (0.00001) it doesn't work.

ftp://ftp.fgdl.org/pub/state/zipbnd_2012.zip

A buffer of 0.0001 worked in testing, which should not significantly change the counts. More precision should only be necessary for parcel-quality or higher data.