ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Add a proximity to stadium feature #482

Closed Damonamajor closed 2 weeks ago

Damonamajor commented 1 month ago

The built in proximity query that we had been using (proximity to cemetery, etc) kept producing the following error:

AMBIGUOUS_NAME: line 89:25: Column 'xy.x_3435' is ambiguous. You may need to manually clean the data at location 's3://ccao-dbt-athena-dev-us-east-1/results/tables/47cac154-5955-4ed2-873e-55a3d15ea8ab' before retrying. Athena will not delete data in your account.

My assumption is because the test is querying itself. I tested renaming the columns as well as removing the pins which we are trying to join to (NOT IN), but the error remained the same. Because of that, I built a SQL query on it's own which should work just fine. I did a decent amount of QCing with it. Let me know if I should go back to work with the "standard" structure.

wrridgeway commented 3 weeks ago

@Damonamajor could you compare your output to this query:

-- Define stadium names and locations
WITH stadiums AS (
    SELECT
        'Soldier Field' AS stadium_name,
        ST_POINT(1179603, 1893122) AS stadium_xy
    UNION ALL
    SELECT
        'Wintrust Arena' AS stadium_name,
        ST_POINT(1178259, 1889955) AS stadium_xy
    UNION ALL
    SELECT
        'Guaranteed Rate Field' AS stadium_name,
        ST_POINT(1174703, 1881406) AS stadium_xy
    UNION ALL
    SELECT
        'Wrigley Field' AS stadium_name,
        ST_POINT(1168634, 1924426) AS stadium_xy
    UNION ALL
    SELECT
        'United Center' AS stadium_name,
        ST_POINT(1163584, 1899780) AS stadium_xy
    UNION ALL
    SELECT
        'UIC Pavilion' AS stadium_name,
        ST_POINT(1168816, 1897726) AS stadium_xy
),

-- Calculate distance between every distinct parcel and each stadium
xy_to_stadium_dist AS (
    SELECT DISTINCT
        parcel.pin10,
        parcel.x_3435,
        parcel.y_3435,
        parcel.year,
        stadiums.stadium_name,
        ST_DISTANCE(
            ST_POINT(parcel.x_3435, parcel.y_3435), stadiums.stadium_xy
        ) AS stadium_dist_ft
    FROM spatial.parcel
    CROSS JOIN stadiums
),

-- Rank distance to stadium within parcel
min_stadium_dist AS (
    SELECT
        pin10,
        x_3435,
        y_3435,
        year,
        stadium_name,
        stadium_dist_ft,
        ROW_NUMBER()
            OVER (PARTITION BY pin10, year ORDER BY stadium_dist_ft)
            AS rnk
    FROM xy_to_stadium_dist
)

-- Choose closest stadium per parcel
SELECT
    pin10,
    stadium_name,
    stadium_dist_ft AS min_stadium_dist_ft,
    year
FROM min_stadium_dist
WHERE rnk = 1