biglocalnews / usps

US Addresses Sampler
MIT License
11 stars 0 forks source link

not valid JSON returned from search #18

Closed yinleon closed 1 year ago

yinleon commented 1 year ago

JSON parsing error for certain queries. Is this because there are no addresses there?

This also occurs in Chicago, which I am sure does have addresses.

Screenshot 2023-03-15 at 3 14 05 PM (1)
yinleon commented 1 year ago

likely related to https://github.com/biglocalnews/usps/issues/14

yinleon commented 1 year ago

Aaron used a random city generator to test cities and he recorded which cities seem to throwing this error usps test 3-15.xlsx

yinleon commented 1 year ago

Spotted in Eugene and Portland Oregon today!

dphiffer commented 1 year ago

I was able to get this running locally and replicate the issue, here's what I'm seeing in the docker console scroll:

usps-db-1   | 2023-04-10 20:25:43.844 UTC [23858] ERROR:  function usps_sample(geometry, unknown, unknown) does not exist at character 440
usps-db-1   | 2023-04-10 20:25:43.844 UTC [23858] HINT:  No function matches the given name and argument types. You might need to add explicit type casts.
usps-db-1   | 2023-04-10 20:25:43.844 UTC [23858] STATEMENT:  
usps-db-1   |               WITH bounds AS (
usps-db-1   |                   SELECT the_geom g FROM tiger.place WHERE gid = $1
usps-db-1   |               )
usps-db-1   |               SELECT
usps-db-1   |                   s.unit,
usps-db-1   |                   s.number,
usps-db-1   |                   s.street,
usps-db-1   |                   s.city,
usps-db-1   |                   s.district,
usps-db-1   |                   s.region,
usps-db-1   |                   s.postcode,
usps-db-1   |                   s.lon,
usps-db-1   |                   s.lat,
usps-db-1   |                   s.tract_id,
usps-db-1   |                   s.blkgrpce
usps-db-1   |               FROM bounds b, usps_sample(b.g, $2, $3) s
usps-db-1   |           
usps-api-1  | INFO:     172.19.0.4:59330 - "POST /sample HTTP/1.0" 500 Internal Server Error
dphiffer commented 1 year ago

I'm not sure why the function created by sample-fn.sql didn't stick around (after running init_addr_data.sh OR to get my Oregon Census data setup) but after running cat sample-fn.sql | psql from the docker console again I got a (maybe) more helpful error message:

usps-db-1   | 2023-04-10 20:38:58.927 UTC [27854] ERROR:  syntax error at or near "a" at character 158
usps-db-1   | 2023-04-10 20:38:58.927 UTC [27854] QUERY:  WITH sample AS ( SELECT random(), a.hash, a.point, a.unit, a.number, a.street, a.city, a.district, a.region, a.postcode, a.tract_id, a.blkgrpce FROM oa_"or" a  WHERE a.tract_id = $1) INSERT INTO local_addrs(r, hash, point, unit, number, street, city, district, region, postcode, tract_id, blkgrpce) SELECT s.* FROM sample s WHERE ST_Contains($2, s.point)
usps-db-1   | 2023-04-10 20:38:58.927 UTC [27854] CONTEXT:  PL/pgSQL function usps_sample(geometry,integer,double precision) line 36 at EXECUTE
usps-db-1   | 2023-04-10 20:38:58.927 UTC [27854] STATEMENT:  
usps-db-1   |               WITH bounds AS (
usps-db-1   |                   SELECT the_geom g FROM tiger.place WHERE gid = $1
usps-db-1   |               )
usps-db-1   |               SELECT
usps-db-1   |                   s.unit,
usps-db-1   |                   s.number,
usps-db-1   |                   s.street,
usps-db-1   |                   s.city,
usps-db-1   |                   s.district,
usps-db-1   |                   s.region,
usps-db-1   |                   s.postcode,
usps-db-1   |                   s.lon,
usps-db-1   |                   s.lat,
usps-db-1   |                   s.tract_id,
usps-db-1   |                   s.blkgrpce
usps-db-1   |               FROM bounds b, usps_sample(b.g, $2, $3) s
usps-db-1   |           
jnu commented 1 year ago

@dphiffer thanks for the stack, that was super helpful, I hadn't had time to reproduce locally.

Just made a fix / deployed it for addresses in Indiana & Oregon, but the Arizona issue appears to be something different. Keeping the issue open, I'll try to look into that some more.

jnu commented 1 year ago

I added Sentry (#26) to our production deployment and was able to capture the stack trace for Glendale, Arizona. It's unrelated to the Oregon/Indiana issue:

(sqlalchemy.dialects.postgresql.asyncpg.InternalServerError) <class 'asyncpg.exceptions.InternalServerError'>: lwgeom_intersection_prec: GEOS Error: TopologyException: Input geom 0 is invalid: Holes are nested at -112.409468 33.529843
[SQL: 
            WITH bounds AS (
                SELECT the_geom g FROM tiger.place WHERE gid = %s
            )
            SELECT
                s.unit,
                s.number,
                s.street,
                s.city,
                s.district,
                s.region,
                s.postcode,
                s.lon,
                s.lat,
                s.tract_id,
                s.blkgrpce
            FROM bounds b, usps_sample(b.g, %s, %s) s
        ]
[parameters: (2290, 200, -1.0)]
(Background on this error at: https://sqlalche.me/e/14/2j85)

That sounds more like the geometry is invalid for some reason. I will look more into that.

jnu commented 1 year ago

The root cause is that TIGER doesn't provide valid geometries in some cases. Looks like there are very few in the place table, which is the most likely spot for them since places can have very complicated geometries.

addresses=# with validation as (select gid, name, ST_IsValidReason(the_geom) reason from place) select gid, name, reason from validation where reason <> 'Valid Geometry'
;
  gid  |     name      |                 reason
-------+---------------+----------------------------------------
  2290 | Glendale      | Holes are nested[-112.40957 33.529845]
  8701 | Joliet        | Holes are nested[-88.074056 41.510616]
 26151 | Knoxville     | Holes are nested[-84.082463 35.893129]
 19737 | Wesley Chapel | Holes are nested[-80.682658 34.989184]
(4 rows)

I believe we can fix this fairly easily in the ingest scripts to repair the geometries if necessary. I can run queries on the server to find / repair the invalid geometries today.

sentry-io[bot] commented 1 year ago

Sentry issue: USPS-API-1