Closed britt-allen closed 11 months ago
Recapping what we learned during our discovery calls with @fennisreed. Fennis if I have misunderstood or missed something can you clarify for me, please? cc: @ian-r-rose
How does Cadastral-based Expert Dasymetric System (CEDS) fit into all of this? Would this be a technique we use to add three columns to the final dataset for parcel-level population, city, and county estimates?
Are we missing other columns that should exist in a table of BF joined with county data?
cc: @ian-r-rose @fennisreed
Recapping what we learned during our discovery calls with @fennisreed. Fennis if I have misunderstood or missed something can you clarify for me, please? cc: @ian-r-rose
Current process for flagging data and using in analysis
- Fennis intersects Parcels with BF data (BF = Building Footprint)
- Fennis uses a 90% threshold to determine the retention of a BF in a parcel (meaning if 90% or more of a BF is in a parcel we make the intersection, if not we abandon the intersection?) This is currently a manual/visual process - no calculation used
- If nothing has the bulk like 20%, 20%, 20%, 40%, we wanna split it (meaning if no BF takes up the majority of a parcel, 90%+, we split the parcel up with each corresponding BF?)
The final BF joined with Parcel table would have X dimensions (columns). These are:
- Parcel index (Is this something we'd create - e.g. uuid or seq4 - or do parcels already get indexed?)
- Building footprint area (This is just the geometry, correct?)
- Inverse euclidean distance raster (Distance between other building footprints - this number gives a view into housing unit density)
- Residential parcels (this column would be a flag for if a parcel is
How does Cadastral-based Expert Dasymetric System (CEDS) fit into all of this? Would this be a technique we use to add three columns to the final dataset for parcel-level population, city, and county estimates?
Hi Britt! A few clarifications:
Current Process for flagging specific footprints as bad
- First step is usually to scope to county
- Area_geo is not always filled, this is something @fennisreed manually adds (What is area geo? How do you arrive at this number? is it the intersection of a BF with a County?)
- Fennis has no preference with SI or imperial units - just make sure we document
- @ian-r-rose Ian made mention of a Make Valid function to address bad geometry
Data cleaning:
- Fennis recs using any geo repair functions before doing area calcs
- Sorts by largest areas and manually looks through top 100 to visually see what is a building, deletes what is not - does this county by county - Goes finer grain for Los Angeles, Riverside, and San Bernardino counties
- Explore geohashing for adding a primary key/id option
- Likes that geodatabases are reduced storage compared to using shapefiles and is consistent between tools they are using - Geodatabase for DOF deliverable, but another open format for public use
In the case when a footprint touches two counties, we’ll start with assigning it to the one where the majority of the footprint is. Do we want to capture additional information about overlapping footprints?
Error column
Geometric correction, overlapping xx, … (Still unsure what this column represents and the logic to get there, not the code, just the thinking that would inform the code)
County (A field for the county is preferred. Each footprint should be coded to a single county.)
Intersection flag (In addition to the county field, a single status to denote a footprint intersects multiple counties would be useful.)
If there is additional metadata we get from the geometric correction tool, this would be a good place to store it. (what tool? ESRI?)
Are we missing other columns that should exist in a table of BF joined with county data?
Area_Geo is an artefact from an earlier version of the footprints. Ignore this field, and proceed with out own calculated area for each footprint.
Thanks, @fennisreed. To clarify: is it helpful to have a separate pre-computed column for area? If I were doing this with geopandas, I would not include an area column in the table, and just compute it on the fly from the geometries as needed. But there may be workflow-related reasons to have it that I'm not aware of.
Parcels are already indexed! An updated footprint area will be required. Two indexes for the building footprint would be ideal - One representing the origin footprint available from the public dataset, with a secondary key for those footprints that have been further segmented.
I think you talked about this already, but how are parcels indexed? Is it by assessor ID of some sort, or an opaque UUID of some sort? Are the IDs guaranteed to be stable between different versions of the dataset?
Area_Geo is an artefact from an earlier version of the footprints. Ignore this field, and proceed with out own calculated area for each footprint.
Thanks, @fennisreed. To clarify: is it helpful to have a separate pre-computed column for area? If I were doing this with geopandas, I would not include an area column in the table, and just compute it on the fly from the geometries as needed. But there may be workflow-related reasons to have it that I'm not aware of.
The main need for footprint area is for the Parcel Summary tables for internal use, where a sum of area per parcel is much easier to process ahead of time, rather than attaching it to CEDS or the random forest. It can take a fair amount of time to calculate, so just doing the whole state once per update has been a manageable way to handle it in the past.
I think we could forgo an area calculation for the public product.
Parcels are already indexed! An updated footprint area will be required. Two indexes for the building footprint would be ideal - One representing the origin footprint available from the public dataset, with a secondary key for those footprints that have been further segmented.
I think you talked about this already, but how are parcels indexed? Is it by assessor ID of some sort, or an opaque UUID of some sort? Are the IDs guaranteed to be stable between different versions of the dataset?
All parcels have received a DOF specific index from the FME GOID tool, resulting in unique identifiers like 473F433D6CC29746265F525D92394292. We rely on APN and County to join back to ParcelQuest data, as the same APN can be present in multiple counties. Otherwise, ParcelQuest does not distribute a primary key.
The DOF applied IDs have been the same since 2021, and have been used in a good number of deliverables and scripts so I'd prefer to retain the same IDs if possible.