NYCPlanning / db-safegraph

SafeGraph ETL
4 stars 1 forks source link

daily_zip_poivisits_by_sector and daily_zip_poivisits_by_subsector - Boroughs have the same zip code #87

Open AmandaDoyle opened 3 years ago

AmandaDoyle commented 3 years ago

ZIP code 10001 is in both MN and BX. The subsector file contains other MN ZIP codes with BX records, such as 10023. How are we calculating ZIP codes?

SPTKL commented 3 years ago

zip codes are provided by safegraph, not calculated

mgraber commented 3 years ago

Query to see how big of a problem this is:

WITH daily_visits AS(
SELECT safegraph_place_id, poi_cbg, postal_code, date_add('day', row_number() over(partition by safegraph_place_id), date_start) AS date_current, CAST(visits AS SMALLINT) as visits
FROM (
  SELECT
     safegraph_place_id,
     poi_cbg,
     postal_code,
     CAST(SUBSTR(date_range_start, 1, 10) AS DATE) as date_start,
     CAST(SUBSTR(date_range_end, 1, 10) AS DATE) as date_end,
     cast(json_parse(visits_by_day) as array<varchar>) as a
  FROM safegraph.weekly_patterns
  WHERE SUBSTR(poi_cbg,1,5) IN ('36085','36081','36061','36047','36005')
  AND CAST('2020-10-01' AS DATE) < dt
  AND CAST('2020-11-01' AS DATE) > dt
) b
CROSS JOIN UNNEST(a) as t(visits)
),

boro_counts AS (
SELECT postal_code, COUNT(DISTINCT SUBSTR(poi_cbg,1,5)) as boro_count
FROM daily_visits
GROUP BY postal_code
ORDER BY boro_count DESC)

SELECT
boro_count,
COUNT(postal_code) as num_zipcodes
FROM boro_counts
GROUP BY boro_count

For the month of Oct 2020:

Number of boroughs associated with a zipcode Number of zipcodes
1 231
2 36
3 0
4 22
5 2