Closed britt-allen closed 11 months ago
Thank you for chatting today, @britt-allen!
When intersecting with building footprints, there are two criteria in which part of a footprint should be dropped:
Stacked parcels may occur in the provided dataset, and are noted with a boolean field P_Stacked. These are areas where duplicate parcel geometry represents distinct units within the same structure or complex. If memory serves, stacked cases with > 10 parcels have been collapsed into a singular cumulative parcel favoring residential codes and total number of units. Stacked cases with <=10 parcels have been retained for unit-specific attribution.
For the purposes of this project, no parcels should be dropped from the dataset. In stacked situations, if the lower unit has 500sqft of footprint area, the upper (visually duplicate) unit should also receive 500sqft.
Summing intersecting footprint sqft over stacked parcels may result in > 100% of original footprint area. This complicates comparison using the 10% threshold. In these situations, I recommend calculating percent membership using the sum of intersected footprint areas, rather than the originally calculated footprint area.
Because this model is intended for use in human population estimation, favor should be given where USETYPE <= 112, which denote residential use. Because our estimates are regularly updated with jurisdiction specific permitting data, it is still helpful to calculate footprint area for all parcels including non-residential features - which may yet receive residential correction down the line.
Please let me know if I can provide further clarification!
Handing off to @ian-r-rose as the programming work is a bit more advanced than time allows me to learn and implement.
Hi @fennisreed, I'm working on refining some of the criteria you listed above, and wanted to run a couple of things by you:
<10% of a footprint is contained in the area UNLESS the footprint is being split into >=5 parcels (for dense urban areas, where sometimes footprint areas <10% are accurate where the the remotely sensed footprint spans a whole city block).
Makes sense! In some of my testing I'm finding that 20% seems to work a little better (at least according to my intuition about whether/how things should be split). In any event, I intend to make this number configurable. It also depends on what metric you use for the relative comparison, about which more below.
Summing intersecting footprint sqft over stacked parcels may result in > 100% of original footprint area. This complicates comparison using the 10% threshold. In these situations, I recommend calculating percent membership using the sum of intersected footprint areas, rather than the originally calculated footprint area.
I'm going to suggest a tweaked version of this that may be a little simpler to reason about: what if instead we calculate the percent as a fraction of the maximum intersected area? So the largest intersection of a footprint and parcel would always be 1.0, and everything smaller would be a fraction of that. We would then discard all footprint fragments that are less than X% of the largest intersection.
What I like about this is that it is a bit more robust to things like fully overlapping parcels (whether they are errors of inclusion or things like condos). But it's quite easy to test both approaches: the difference between them is a one line change.
@fennisreed I have some sample code with an initial implementation of the algorithms here: https://gist.github.com/ian-r-rose/d94547ba282be59398b4474327839bef
The intention of the functions is that they be put in a python module that could be imported into an ArcGIS notebook. If that is not possible, I suppose they could be copy/pasted around. But the actual usage of the algorithm is fairly brief and the cutoff parameters, etc are customizable:
# Read the data!
footprints = geopandas.read_parquet(
"s3://dof-demographics-dev-us-west-2-public/global_ml_building_footprints/parquet/county_fips_003.parquet"
)
parcels = geopandas.read_file(
"Artificial Parcel Network 2021.gdb", layer="T_003_Parcels"
)
# Do the spatial join and handle footprints that intersect multiple parcels.
footprints_with_parcels = assign_footprints_to_parcels(
footprints,
parcels,
)
It's probably best discussed in a higher bandwidth setting, but a few things I'm noticing about it:
@fennisreed you also expressed interest in being able to output a CSV with parcel index and the total building footprint area for that parcel. Here is a snippet that can be tacked on to the above footprints_with_parcels
calculation to do that:
footprint_area_per_parcel = (
footprints_with_parcels
.assign(footprint_area_sqft=footprints_with_parcels.to_crs(epsg=2226).geometry.area)
.groupby("Parcel_Ind")
.agg({"footprint_area_sqft": sum})
)
footprint_area_per_parcel.to_csv("footprint_area_per_parcel.csv")
This computes the sum of the footprint areas for each parcel (note the choice of EPSG:2226 to get Alpine county in feet).
TODOs from meeting on 11/8
- [x] @fennisreed to validate the CSV output snippet above
CSV export snippet works great - thank you for this, Ian!
Reviewing the larger parcel x footprint reconciliation script, everything seems to work as planned. The majority of circumstances were well represented with appropriate parameter tuning, save for dense urban areas like San Francisco.
We discussed instances where multifamily footprints are loosing portions for failing to meet the two thresholds, and have proposed a new parameter. Footprints intersecting >5 parcels will ignore the two subsequent thresholds. Examples of where this behavior could be valuable include Alameda (near footprint 379303 at 37.768149 -122.285325) and San Francisco (near footprint 56031 at 37.768149 -122.285325).
@fennisreed I've updated my gist to include a new parameter count_cutoff
, which doesn't throw out any intersections if more than count_cutoff
intersections occur. Let me know if this helps with any of the multifamily building cases you had seen!
Can this be safely marked done following yesterday's improvement and discussion between you and Fennis? @ian-r-rose
The count cutoff feature is working as intended, with the addition of new geometry types from the primary Geometry field. Export of resulting features works great for polygon and multipolygon features. This resolves all the multifamily instances we reviewed during our previous call!
An additional geometry type 'GeometryCollection' is now produced in some instances, which cannot be exported to ESRI Shapefile or geodatabase and requires further transformation. A great example of this is in Alameda County at block_geoid 060014507432048. Is there a way to easily convert these features to polygon or multi-polygon?
Thank you!
Hi @fennisreed, sorry for the slow reply here, and thanks for the nice reproducible case!
I took a look at the block you pointed out in the Alameda County dataset, and it seems that the issue is coming from the parcels dataset. There are a few parcels that are LineStrings, MultiLineStrings, and GeometryCollections. Presuming that there are no valid 1-dimensional parcels, I think we can extend the parcel cleaning step with the following snippet, which does three things:
from shapely.ops import unary_union
# Ensure the parcel geometries are valid
parcels = parcels.assign(geometry=parcels.geometry.make_valid())
# Drop any LineString or MultiLineString geometries as impossible parcels
parcels = parcels[~parcels.geom_type.isin(("LineString", "MultiLineString"))]
# Fix up any GeometryCollections to be polygon-only
fixed = (
parcels[parcels.geom_type == "GeometryCollection"]
.geometry
.apply(
lambda s:
unary_union([g for g in s.geoms if g.geom_type in ("Polygon", "MultiPolygon")])
)
)
parcels.loc[fixed.index, ["geometry"]] = fixed
This should be done before joining with the footprints dataset.
Does this resolve the issue? If so, I'll update the gist with the new preprocessing step.
Hi @ian-r-rose ,
Last week I was able to validate the latest portion of this script. Of the counties run, this seems to resolve any outstanding issues with geometry collections, small lines, or other sliver features brought in with the parcel dataset.
This script is a huge improvement to my process and will add considerable nuance when combining parcel and footprint data.
Thank you so much for this!
Thanks for the update @fennisreed! Closing as complete
ParcelQuest data is private, so we can't share any derived data products via our S3 bucket. Instead, we are generating some local utility functions that allow us to do the following:
Take our public building footprints datasets hosted in S3 and do a spatial join on any subset of parcel shapes Write some heuristics for footprints which intersect more than one parcel (specific numbers are made up, we should make it tune-able and set good defaults in collaboration with @fennisreed): If a footprint is ~90% in a parcel, just assign it to that parcel, drop any others that might intersect with it. If a footprint has ~30-70% intersections with a parcel, split the footprint and assign the sub-footprints to the relevant parcels. The resulting footprints-with-parcels dataset should be able to easily generate the following "final" data files:
A table with Parcel Index and building footprint area (i.e., the corrected area of the footprint as processed above) A spatial file (shapefile, parquet, or otherwise) with Parcels as primary geometry, footprint area, and the geometry of all footprints that might exist on the parcel aggregated/ST_UNION'd.