ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
5 stars 3 forks source link

Check OSM tags/definitions for major/secondary roads #466

Open dfsnow opened 1 month ago

dfsnow commented 1 month ago

Valuations has noted that for many parts of Chicago, abutting a major road (Ashland, Irving Park, etc.) is a negative amenity and is likely to lower sale price.

Currently, we define major roads via the OSM way tags: primary, motorway, and trunk. These tags capture stuff like major highways, LSD, etc. Secondary roads use the OSM tags secondary, and exclude anything that's a major road.

We should review the roads resulting from the tags and make sure that they align with Valuations' intuition about which roads result in disamenity.

Damonamajor commented 3 weeks ago

@dfsnow @wagnerlmichael

Because of how the function is coded, primary roads and secondary roads are "fixed" by their earliest coding in open-street data. While OSM data is reliable in some ways, it's underlying tagging may not be held to the highest standards. This came up during desk review where Ashland was not being shown as a secondary street. When looking at the underlying data, North Ashland is considered a "Major Road" even though in 2023, OSM classified it in 2023 as a "Secondary Road". This can be seen by the following AWS query. This same process could be found in a series of roads, for example Milwaukee or Roosevelt, all of which I'd consider a secondary road.

SELECT * FROM "proximity"."dist_pin_to_major_road" where pin10 = '1418211012'

To test this I downloaded the 2023 Raw spatial parquet file (input) from S3 and compared it to the following query (output).

SELECT distinct(nearest_secondary_road_name) FROM "proximity"."dist_pin_to_secondary_road" where year = '2023'

I looked at the diff between them, but it is quite difficult to dissect. For example, W 143rd St changes to 143 street. But, removing the directional signs would also remove the findings of the first paragraph.

dfsnow commented 2 weeks ago

If I recall correctly we did the "pinning" of the road type on purpose, such that roads effectively keep their first available OSM type. This was to prevent them from flipping back and forth due to OSM tag changes. I believe @wagnerlmichael implemented this, or maybe @wrridgeway.

Either way, what we need to investigate here is whether the tags makes sense and truly capture the "negative amenity" roads. For example, not many people probably want to live along Ashland or Irving Park if they can avoid it. As a result, those roads should be included in one of the spatial.*_road databases. You should use your own local knowledge of busy roads and check that the ones you know are represented in in our road databases. Possibly reach out to JM in valuations as well.

I'll continue to think about how to do this more concretely.

wrridgeway commented 2 weeks ago

I haven't touched the road stuff, keep me OUT of this.

wagnerlmichael commented 2 weeks ago

If I recall correctly we did the "pinning" of the road type on purpose, such that roads effectively keep their first available OSM type. This was to prevent them from flipping back and forth due to OSM tag changes. I believe @wagnerlmichael implemented this, or maybe @wrridgeway.

This was me and what Dan recalled is correct