GreenInfo-Network / nyc-crash-mapper-etl-script

Extract, Transform, and Load script for fetching new data from the NYC Open Data Portal's vehicle collision data and loading into the NYC Crash Mapper table on CARTO.
3 stars 0 forks source link

Filter out badly geocoded data after updating crashes table in Carto #2

Closed clhenrick closed 7 years ago

clhenrick commented 7 years ago

It appears new NYC Open Data contains lat lons way outside of NYC in the Atlantic Ocean and Null Island. These should be filtered out via an SQL UPDATE query after new data has been inserted.

Something like:

UPDATE crashes_all_prod
SET geom = null
WHERE cartodb_id IN (
  SELECT
    a.cartodb_id
  FROM
    crashes_all_prod AS a LEFT JOIN
    nyc_borough AS b ON
    ST_Intersects(b.the_geom, a.the_geom)
  WHERE b.cartodb_id IS NULL
  AND a.the_geom IS NOT NULL
) _
clhenrick commented 7 years ago

Actual query was:

UPDATE crashes_all_prod
SET the_geom = NULL 
WHERE cartodb_id IN
(
    WITH box AS (
        SELECT ST_SetSRID(ST_Extent(the_geom), 4326)::geometry as the_geom,
        666 as cartodb_id
        FROM nyc_borough
    )
    SELECT c.cartodb_id
    FROM crashes_all_prod AS c 
    LEFT JOIN
    box AS a ON
    ST_Intersects(c.the_geom, a.the_geom)
    WHERE a.cartodb_id IS NULL
    AND c.the_geom IS NOT NULL
)
clhenrick commented 7 years ago

Seems to be fixed, waiting on a few more data updates to confirm.