degauss-org / census_block_group

A docker container for assigning census block group id to geocoded addresses.
https://degauss.org/census_block_group
GNU General Public License v3.0
4 stars 2 forks source link

geometry evaluation error #16

Closed andrew-vancil closed 2 years ago

andrew-vancil commented 2 years ago

Error generated when matching to 2000 census tracts"

finding containing geography for each point... Error in CPL_geos_op2(op, x, y) : Evaluation error: TopologyException: Input geom 1 is invalid: Ring Self-intersection at or near point 1050692.1045394999 1916457.7976013292 at 1050692.1045394999 1916457.7976013292. Calls: <Anonymous> ... geos_op2_df -> geos_op2_geom -> st_sfc -> CPL_geos_op2 Execution halted

The first (lat, lon) in the file is (39.226238, -84.555914)

cole-brokamp commented 2 years ago

Seems to be the same as https://github.com/degauss-org/st_census_tract/issues/8

We can take the same approach by rebuilding the block group files with st_make_valid

cole-brokamp commented 2 years ago

@erikarasnick When we make the fix, let's also add this example lat/lon in the file to make sure we can see it fail and then see it succeed after we rebuild the tract files

erikarasnick commented 2 years ago

strangely, I am not getting the error for that lat/lon. Are there more digits for that lat/lon in your data @andrew-vancil ?

andrew-vancil commented 2 years ago

Hmm, I'm not sure the best way to pin down the offending coordinates. I thought that the "input geom 1 is invalid" suggested the first lat/lon, but I guess I misread that. Is there a way to find the bad one? Unfortunately I've got >12,000 rows

erikarasnick commented 2 years ago

In the past I have split the file in half recursively until I narrowed it down. But maybe this would be a good place to use mappp?

andrew-vancil commented 2 years ago

Well I thought I narrowed it down to this point (39.635348, -83.542276). But, I tried running the container on the dataset without that point and I still got the same error. When I was splitting the dataset down, I did get some weird stuff where a portion of the data would give me the error, I would split that portion in half and then neither half would give me an error. I've made sure that I'm not losing a row when splitting, so I'm not sure why that's happening. But I do know that the coordinates here for sure give the error.

cole-brokamp commented 2 years ago

@andrew-vancil can you send us the smallest possible CSV file that creates the error?