ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
6 stars 4 forks source link

Create corner lot indicator #1

Closed wrridgeway closed 4 months ago

wrridgeway commented 1 year ago

Overview

Assumption: corner lots tend to be more valuable than non-corner lots due to their size, increased accessibility (for commercial), and prominent location.

Given this assumption, we should add a corner lot indicator variable to all AVMs, as well as to commercial valuation spreadsheets/methods. We can impute a corner lot variable using a variety of GIS methods, including:

We should use a combination of methods to output a single indicator (0 or 1) for each PIN in the county. N at Mansueto and J at CSDS will likely have good ideas on this.

Task

Create an R script that takes property locations (lat, lon) or shapes (parcel file) as an input, then outputs a corner lot indicator (1 or 0) for each PIN. Start with a prototype script using local data. The final script should pull raw data from S3 and write back to S3. It should run as a glue job. The script should be saved to was-glue/jobs/location-corners.R. Use the branch attached to this issue to save your work and submit it for review.

@miaoqi, figuring out how to identify these lots algorithmically will be a large part of this task. There is existing literature in the space in various geography and spatial data science journals. I can also put you in touch with folks at UChicago who can provide help if needed. Let's aim to have a running script by Feb 15th.

Some R libraries you'll probably need to get started:

wrridgeway commented 1 year ago

Per Nico, quantify the nodal degree of the OSM street network. Degree of 3 or higher is corner

wrridgeway commented 1 year ago

Hi here are some additional helpful resources:

If you are working in R I would highly recommend using :

Both of these have many useful functions. sfnetworks is particularly useful for converting street linestring geometries to a network data structure. tidygraph has many functions that are inter-operable with sfnetworks including nodal degree measures (see this: https://tidygraph.data-imaginist.com/reference/local_graph.html). For instance street intersections with nodal degree 4+ are "True Corners" however from a network analysis standpoint you may want to use 3+ which is usually considered "high access" from a connectivity standpoint.

If you're using Python you could check out the packages OSMNX, momepy, and NetworkX 3.0.

I really like momepy and here are some potentially useful functions:

wrridgeway commented 1 year ago

Also not sure how useful this is but here is a workflow I developed using sfnetworks and tidygraph if you're in need of an end-to-end example.

https://github.com/mansueto-institute/settlement-networks/blob/main/prod/network-workflow.R#L119

wrridgeway commented 1 year ago

Update (2023-06-22)

This issue is roughly 50% complete. The code for algorithmically detecting corners is largely finished and the corner indicators have been exported for all but 3 townships. However, the code is not especially efficient and struggles to finish on the larger towns. Below is an outline of the remaining tasks as they currently stand. See above for context and background on this issue.

Tasks

Review Workflow

While the corner detection algorithm is generally pretty accurate, it breaks down in certain situations and for certain parcel shapes. Since this is an attribute that will be used for modeling, we need to develop a one-time manual review process for the indicator. The easiest way to do this is likely by visually inspecting parcel shapefiles that are colored using the corner lot indicator. We can then manually edit the corner lot attribute for incorrect corners.

I recommend that we "chunk" the work into townships, then hand each town to analysts for review. Adding a basemap for this, in addition to the parcel shapes, is likely to be helpful. Using a collaborative web mapping service such as Felt or Placemark may also be helpful. We may also be able to leverage Nearmap.

The output of this review workflow should be a finalized CSV (per town) with two columns: PIN10 and corner_indicator.

dfsnow commented 1 year ago

@Damonamajor You'll want to switch to using this issue. I would port branch 108-create-corner-lot-indicator to this repo and continue working here. You should only have to change the git remote.

Damonamajor commented 1 year ago

Context In August, a dataset for West was provided to a staff analyst to assess the accuracy of the corner lot function. An edited shapefile was returned, which identified errors in the corner lot function. The following chart provides an overview of his findings, with the largest error being parcels which were incorrectly identified as a corner lot. In total, it represented ~98% accuracy.

Definition QGIS Code Count Percent
Parcels which were correctly labeled as not a corner lot 0 94792 88%
Parcels which were correctly labeled as a corner lot 1 11202 10%
Parcels which were labeled as not a corner lot but are deemed to be a corner lot 2 99 0%
Parcels which were / were not a corner lot, and you are unsure of 3 244 0%
Parcels which were labeled as a corner lot, but are deemed to be not a corner lot 4 1020 1%

Takeaways from Analyst Review

Follow-up Steps

Class Count
100s 76
200s (excluding Condos) 96
299 53
300s 38
400s 0
500s 281
600s 16
700s 4
800s 8
Exempt 621
Railroad 141

Suggestions

Things Left Unaccounted For Tunnels / Bridges Changes to OpenStreetMap will produce small changes whenever the the function is re-run.

@dfsnow @ccao-jardine