Create IDOT features - Githubissues

Damonamajor commented 1 month ago

This PR takes the output of 617 and transforms the data into proximity features. Basic Steps:

Identifies the proximity characteristics for each road typology. For example, nearest_highway_lanes would be the number of lanes for the nearest highway. It also identifies the distance of the closest highway with the standard dist_ft suffix.
Using the distance feature, it uses a macro to identify the smallest distance value. With that distance value, it creates features for nearest_road_lanes, nearest_road_dist_ft, etc..

Ongoing questions:

The local roads still times out when run for all characteristics. Is this something we want to investigate. Currently we only identify the local_road features where there is a valid average daily traffic. -- Local roads could use a structure like the one present in dist_pin_to_pin. But, the macro as currently structured doesn't seem to support it. Is this a priority to fix? -- The method could subset the data to speed up the process would be to include only roads and parcels in the same township. Is that a structure that we would want?
Building on this, would we want to separate out major and secondary roads as we did before? Something like highways / freeways / collectors/ arterial / local makes sense in my head. At minimum, it is probably necessary to remove local roads from this query. If a secondary road and a local road are both abutting a street, we would probably want to make sure the secondary road is accounted for even if it is 5 ft farther away.
Do we want to keep the nearest feature as a macro since it could be used again (with a little tuning)? -- I also did some playing with it, and it seemed the easiest way to do it without a macro would be doing a case when for each new variable. So in this sense, the macro does make sense.

Damonamajor commented 5 days ago

@wrridgeway

Questions:

At the moment highways are coded as highway_roads. This doesn't make sense linguistically, but does make sense when keeping columns in the same structure / if someone wants to find road columns. For example prox_nearest_highway_road_dist_ft. Thoughts?
The features in the added model in model.vw_pin_shared_input.sql are coded as nearest_collector_road_lanes. Do we want to keep them in proximity, since it is proximity to nearest road or have them in another chunk (maybe environment or it's own chunk)? -- Environmental and access data vwlf.env_flood_fema_sfha AS loc_env_flood_fema_sfha,
The initial dataset was named as traffic. It makes sense to rename this to roads now, but the workflow is wonky with aws uploads? Do we want to rename this in a separate PR?
A lot of highways have 1 lane. Do we want to just filter to highways with more than 1 lane since these are mostly service lanes on the side of the road?

wrridgeway commented 4 days ago

At the moment highways are coded as highway_roads. This doesn't make sense linguistically, but does make sense when keeping columns in the same structure / if someone wants to find road columns. For example prox_nearest_highway_road_dist_ft. Thoughts?

I'd actually prefer a road prefix rather than suffix for all the roads, highway included.

The features in the added model in model.vw_pin_shared_input.sql are coded as nearest_collector_road_lanes. Do we want to keep them in proximity, since it is proximity to nearest road or have them in another chunk (maybe environment or it's own chunk)?

I'm fine with it as is.

The initial dataset was named as traffic. It makes sense to rename this to roads now, but the workflow is wonky with aws uploads? Do we want to rename this in a separate PR?

I realize it's a pain in the ass, but getting the names right before we merge things into master is preferable. I'm happy to run anything if you need me to.

A lot of highways have 1 lane. Do we want to just filter to highways with more than 1 lane since these are mostly service lanes on the side of the road?

Let's leave it for now and then we can open a new PR after we're done here investigating this kind of additional processing.

ccao-data / data-architecture

Create IDOT features #620