ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
6 stars 4 forks source link

Combine MS building footprint data with LIDAR to estimate building sqft #6

Open wrridgeway opened 1 year ago

wrridgeway commented 1 year ago

Microsoft publishes US building footprint data that contains GeoJSON polygons of (almost) every building in Cook County. ISGS publishes LiDAR-based estimates of height, including building height.

Condos currently lack building or unit square footage, but we may be able to combine these two data sources to produce an estimate of building size or even a rough unit sqft.

Steps

wrridgeway commented 1 year ago

Outline of approach:

  1. Download building footprint data
  2. Spatially join Condo PIN locations (lat/lon) to building footprints
    • We initially used the Microsoft Footprints for all of Cook County. Because of concerns about their accuracy, we used the City of Chicago Footprints to further test this method.
  3. Create polygon buffer (5m) of each Condo footprint for later use.
  4. Obtain relevant LIDAR data.
    • Because of the file size for all of Cook County (> 300GB), we process tiles one at a time. LIDAR data comes in ~5,000 tiles that cover all of Cook County, with ~1,000 that intersect with Condo buffer layer.
    • Download the single shapefile that contains the locations of each LIDAR tile and intersect with the Condo buffer footprints. We only consider the tiles that we know intersect to process them more efficiently.
  5. Obtain height estimates from LIDAR data:
    • For each LIDAR tile (out of the tiles we know intersect at least one footprint), select only LIDAR points classified as building or ground. Take a 10 percent sample of all LIDAR points to speed spatial intersection (There are ~20M points per tile).
    • Select only building points. Intersect single LIDAR tile with Footprints layer. Calculate maximum elevation of building, 95th percentile, and mean elevation.
    • Select only ground points. Intersect LIDAR tile with Footprints Buffer. Calculate average ground elevation as mean height of all points within buffer.
    • Calculate total building height as maximum (or 95th percentile) height - average ground height.
wrridgeway commented 1 year ago

Current issues with approach:

  1. There is an imperfect spatial join between the locations of condo centroids (lat/lons) and building footprints. For example, in a "U-shaped" condo building, the centroid may not overlap with the footprint. We could consider assigning building footprints to PINs by finding the closest centroid of a building footprint to the centroid of the PIN (parcel), or, spatially joining based on parcels.
  2. Geometries of Microsoft footprints are incorrect (esp. in the loop/downtown or for close buildings). This results in large errors in square footage.
    • Potential fixes: Use City of Chicago footprints for Chicago, and supplement with Microsoft data outside of City.
  3. When using correct footprints (from City of Chicago), there are some quite unrealistic results for taller buildings.
    • The main way in which we have attempted to verify the results is by comparing building height with the number of stories in the City of Chicago data. It is unclear how accurate the STORIES field is in the City of Chicago data. For example, some buildings in the loop that result in a high estimated number of stories from the LIDAR data have a low value for the STORIES field. It is possible that the LIDAR data is correct and the City data is incorrect.
    • Some footprints include an attached garage or other low-lying part, so using the average height would result in an artificially low result. We switched to use the 95th percentile height instead.
  4. The method appears to work well for most 2-5 story condo buildings, with a more reasonable average height per story ~10-12 feet when compared to the City of Chicago data.

Recommendation: Include max height, average height, and footprint square footage without attempting to calculate total internal square footage. This avoids more specific issues with the methods detailed above, but if there are any relationships between these features, they should still be captured by the model.

See reports/lidar_bldg_size_report.html on branch 12-building-footprint-data for a summary of the results.

wrridgeway commented 1 year ago

Potential uses of building footprints data within model beyond sqft estimates: