GreenInfo-Network / seattle-building-dashboard

Energy benchmarking for Seattle
https://greeninfo-network.github.io/seattle-building-dashboard/
ISC License
1 stars 0 forks source link

Footprint Review Analysis - Overlap analysis (SOW Task 2.2) #100

Open danrademacher opened 1 month ago

danrademacher commented 1 month ago

Task 2.2 Footprint Review Analysis - Overlap and square footage analysis

  • Description: Using GIS, identify all instances of building footprints that overlap and present to OSE for recommended resolution. Implement necessary corrections.
  • Description: Using GIS, calculate a ratio of building gross square footage to footprint area for all properties. Properties with the highest and lowest ratios should be prioritized for additional review by OSE. OSE will determine additional corrections required.

So this is desktop GIS work to assess where buildings are overlapping and therefore presumed wrong, and also looking for outliers on the ratio of gross square footage from OSE data vs footprint area. For the second item, we will need to join the building energy data to footprints, since (a) footprint area can be calculated only by footprint but (b) gross square footage comes in the building energy data as reported_gross_floor_area

┆Issue is synchronized with this Asana task ┆Due Date: 2024-06-14

danrademacher commented 3 weeks ago

For the overlap analysis, we need to do the following:

  1. Calculate the percent overlap for any shape that overlaps another shape.
  2. Triage anything obvious at the upper end -- eg, multiple copies of the same building
  3. Provide a list of overlaps to Mike with notes on actions taken or recommended, knowing that some we might not know what to do with (eg a convention center reporting as three different buildings all with the same footprint)

~One critical wrinkle here is that footprints that overlap in the footprint table only cause problems in the applicaiton if they also overlap in time~

~I wonder if we should write a Postgres query in CARTO that joins the years of data available to the footprints, with a separate outline for every year a building is used. then we do multiple overlap assessments within each year. That's a PITA but it seems like a better diagnostic.~

Footprints: https://cityenergy-seattle.carto.com/dataset/seattle_building_outlines_2022 Table: https://cityenergy-seattle.carto.com/dataset/seattle_buildings_2022_update_mr_20240425_v21

danrademacher commented 3 weeks ago

This query seems right:

SELECT public.seattle_building_outlines_2022.*, seattle_buildings_2022_update_mr_20240425_v21.year FROM public.seattle_building_outlines_2022 join seattle_buildings_2022_update_mr_20240425_v21 ON buildingid = id order by buildingid, year

The tedious part will be running the overlap for each year alone, then reporting out the building IDs that error out in each year, then dedupe that into a single list of problem buildings.

danrademacher commented 3 weeks ago

I think we should do this before we do #99

danrademacher commented 3 weeks ago

Latest footprints from CARTO are here: P:\proj_p_s\Seattle Building Dashboard\2024_update

danrademacher commented 3 weeks ago

The ratio of gross square footage to footprint area task:

  1. From footprints, we'd need to calculate the area in Square Feet.
  2. Then join in the reported_gross_floor_area from the data at https://cityenergy-seattle.carto.com/dataset/seattle_buildings_2022_update.
  3. Then calculate the ratio reported_gross_floor_area/footprint_area
danrademacher commented 3 weeks ago

For the overlap, in talking it through with Joseph, we realized that the year/overlap piece is a red herring since building prints are pulled only by ID and not year, so there would never be a case where a single ID would pull multiple iterations of a footprint over time.

That simplifies this task and we can just look at the full footprint dataset irrespective of year.

The overlap check needs to identify the following:

danrademacher commented 3 weeks ago

We should exclude from our results all the buildigng IDs already slated for correction, as noted in #99.

So we could pull from CARTO and run this query:

SELECT * FROM public.seattle_building_outlines_2022 where buildingid not in (41,43,73,84,87,114,125,144,168,172,175,185,190,194,198,205,212,215,217,225,226,227,234,245,247,248,249,266,268,281,293,302,315,325,326,345,350,351,355,358,364,381,422,435,467,477,496,500,516,576,608,658,683,689,690,694,708,728,730,819,1281,19489,19510,19584,19694,19696,19704,19754,19777,19780,19788,19831,19847,19848,19849,19873,19925,19947,19990,19992,20027,20028,20045,20047,20052,20056,20082,20095,20100,20140,20187,20216,20217,20234,20241,20242,20321,20340,20367,20369,20456,20490,20522,20523,20526,20543,20585,20659,20686,20720,20762,20805,20810,20888,20930,21117,21122,21150,21196,21202,21215,21216,21218,21231,21310,21314,21335,21341,21382,21399,21401,21425,21432,21476,21533,21556,21557,21570,21604,21610,21616,21627,21692,21697,21745,21811,21941,22259,22261,22264,22299,22334,22504,22511,22607,22654,22739,22742,22801,22810,22863,22939,22941,22954,22964,23020,23028,23050,23074,23076,23173,23242,23265,23322,23323,23327,23400,23409,23484,23495,23496,23503,23554,23605,23615,23622,23632,23637,23639,23714,23730,23752,23846,23937,23956,24030,24036,24086,24150,24160,24161,24162,24179,24235,24248,24253,24306,24489,24494,24495,24496,24560,24562,24587,24599,24617,24645,24659,24750,24775,24854,24857,24934,24987,25053,25054,25129,25131,25241,25245,25251,25299,25360,25371,25410,25435,25527,25543,25652,25710,25761,25973,26007,26026,26040,26098,26150,26198,26283,26313,26450,26460,26558,26574,26729,26746,26748,26808,26821,26862,26880,27008,27071,27231,27241,27265,27300,27342,27343,27348,27389,27597,27600,27774,27799,27833,27888,27901,27903,27955,27970,27981,28006,28020,28025,28874,29069,29170,29310,29312,29550,29612,29792,30207,31187,32454,33527,37008,37248,39332,40388,40447,41068,41070,45287,48287,49699,49703,49704,49705,49709,49721,49722,49731,49735,49736,49737,49738,49771,49792,49797,49821,49825,49862,49889,49894,49911,49943,49945,49946,49967,49990,49996,49998,50002,50003,50005,50009,50023,50038,50049,50055,50067,50070,50101,50160,50175,50177,50180,50194,50245,50247,50259,50294,50309,50347,50376,50383,50385,50415,50416,50462,50481,50524,50642,50656,50693,50694,50704,50705,50710,50718,20066,103,400)

Run that on https://cityenergy-seattle.carto.com/dataset/seattle_building_outlines_2022

Logging in with this 1P record: https://start.1password.com/open/i?a=AKLRKLJLLJEJ3HURJAVPAOETS4&v=a36tymynaiooq56tfqfvnkll4y&i=l5usgmphlbgqpfk3i4htnvr3bm&h=greeninfonetwork.1password.com

joseph-stout commented 2 weeks ago

Finding some instances with many stacked polygons of the same building outline. Find overlaps tool returned a number of polygons with up to 32 overlaps, which I suspect in many cases are duplicate building footprints that have been stacked as the case is in this instance. Building ID 29 has 8 identical polygons stacked. image

danrademacher commented 2 weeks ago

Huh, that's annoying! I wonder if we could confidently find all instances like this -- where Building ID is the same and footprint is 100% the same and just delete all but on3. That's the end goal in a case like this -- one footprint for one ID.

Do you think that's possible?

joseph-stout commented 2 weeks ago

I think that may be possible! Once I calculate the percentage of overlaps I think I could select those out and delete the duplicates fairly quickly. In the future maybe we could write a short query or script to identify these and delete them, but I am more curious how the duplicates are being generated.

Although-- I may have been mistaken to begin with and this may not be as big an issue as I thought. I just noticed I can generate a shapefile from seattle_buildings_2022_update_mr_20240425_v21 to download from carto and the footprints appear much cleaner compared to seattle_building_outlines_2022 (3,695 records vs. 27,564) and overlaps are minimal (126 records with a max count of 4, vs. 3828 records with max overlap counts of 32).

joseph-stout commented 2 weeks ago

Okay, I think I have something, assuming seattle_buildings_2022_update_mr_20240425_v21 is the most accurate set of building footprints to go off off currently. Area calculations are in square feet.

The process:

  1. Project seattle_buildings_2022_update_mr_20240425_v21 to Washington State Plane North; FIPS Zone 4601 from WGS 84
  2. Run 'find overlaps' tool
  3. calculate area for overlapping polygons in overlaps output layer
  4. spatial join original building footprints layer to overlaps layer, add field and calculate building_id's
  5. calculate net area overlap (subtract original building footprint area from overlapping polygon area)-- this gives some indication of whether an overlapping polygon is close to a duplicate, or is overlaid on a larger polygon
  6. calculate percent overlap (divide area of overlaps by area of original polygon, multiply by 100)

Percentages ~ 100 indicate duplicates; percentages close to zero appear to be slivers. Percentages between indicate the range of partial overlaps. SampleFindings_SeattleBuildingsOverlaps_20240613.xlsx

joseph-stout commented 2 weeks ago

Erased overlaps from primary footprint layer to get rid of duplicates, merged find overlaps output layer to primary footprint layer, to fill back in, and dissolved by building ID to get a single footprint for each building. Only two sliver-overlaps remaining, but the rest seems to have been cleaned up; The output file has slivers at the edges of some of the dissolved polygons that will need to be resolved.

joseph-stout commented 2 weeks ago

Merged some obvious slivers (~30 records), but there remain some things like this should be cleaned up. The overlap has been removed but the building footprint (and ID's) are broken up into multiple parts:

image

An artifact of my earlier workflow is that it left fragments like this, with unique building ID's in a footprint. In other cases where building footprints touch they are part of a cluster of defined individual footprints, so it is not as straightforward to resolve. There are 3,337 records in the attribute table and manually merging where appropriate would take a while. I will keep mulling this over.

joseph-stout commented 2 weeks ago

Aside from the question about how to handle the outcomes above, I realized the only places that should have been affected by the previous workflow are where the overlaps occurred (126 footprint locations). So I am going through these individually and resolving as needed, it is a much more manageable sample size.

danrademacher commented 2 weeks ago

Great progress.

Hmm, as far as using seattle_buildings_2022_update_mr_20240425_v21 as the data source, I need to look into that.

That's an active query that pulls form seattle_building_outlines_2022 but excludes 374 building IDs:

SELECT * FROM public.seattle_building_outlines_2022 where buildingid not in (41,43,73,84,87,114,125,144,168,172,175,185,190,194,198,205,212,215,217,225,226,227,234,245,247,248,249,266,268,281,293,302,315,325,326,345,350,351,355,358,364,381,422,435,467,477,496,500,516,576,608,658,683,689,690,694,708,728,730,819,1281,19489,19510,19584,19694,19696,19704,19754,19777,19780,19788,19831,19847,19848,19849,19873,19925,19947,19990,19992,20027,20028,20045,20047,20052,20056,20082,20095,20100,20140,20187,20216,20217,20234,20241,20242,20321,20340,20367,20369,20456,20490,20522,20523,20526,20543,20585,20659,20686,20720,20762,20805,20810,20888,20930,21117,21122,21150,21196,21202,21215,21216,21218,21231,21310,21314,21335,21341,21382,21399,21401,21425,21432,21476,21533,21556,21557,21570,21604,21610,21616,21627,21692,21697,21745,21811,21941,22259,22261,22264,22299,22334,22504,22511,22607,22654,22739,22742,22801,22810,22863,22939,22941,22954,22964,23020,23028,23050,23074,23076,23173,23242,23265,23322,23323,23327,23400,23409,23484,23495,23496,23503,23554,23605,23615,23622,23632,23637,23639,23714,23730,23752,23846,23937,23956,24030,24036,24086,24150,24160,24161,24162,24179,24235,24248,24253,24306,24489,24494,24495,24496,24560,24562,24587,24599,24617,24645,24659,24750,24775,24854,24857,24934,24987,25053,25054,25129,25131,25241,25245,25251,25299,25360,25371,25410,25435,25527,25543,25652,25710,25761,25973,26007,26026,26040,26098,26150,26198,26283,26313,26450,26460,26558,26574,26729,26746,26748,26808,26821,26862,26880,27008,27071,27231,27241,27265,27300,27342,27343,27348,27389,27597,27600,27774,27799,27833,27888,27901,27903,27955,27970,27981,28006,28020,28025,28874,29069,29170,29310,29312,29550,29612,29792,30207,31187,32454,33527,37008,37248,39332,40388,40447,41068,41070,45287,48287,49699,49703,49704,49705,49709,49721,49722,49731,49735,49736,49737,49738,49771,49792,49797,49821,49825,49862,49889,49894,49911,49943,49945,49946,49967,49990,49996,49998,50002,50003,50005,50009,50023,50038,50049,50055,50067,50070,50101,50160,50175,50177,50180,50194,50245,50247,50259,50294,50309,50347,50376,50383,50385,50415,50416,50462,50481,50524,50642,50656,50693,50694,50704,50705,50710,50718,20066,103,400)

I believe those are the buildings that Mike already flagged for custom/bespoke correction, so that's why we excluded them -- no need to review things he's already flagged for fixing.

But I can't quite figure out why there are 27K outlines in seattle_building_outlines_2022. I looked at the tabular data and in there we have only 3692 buildings with energy data. That's this query in CARTO:

 SELECT count(distinct(id)) FROM public.data_2022_update

So the totals in seattle_buildings_2022_update_mr_20240425_v21 make more sense. But what are the other 25K footprints?

Can you post a screenshot of all the buildings you analyzed? Just want to get a gestalt sense of how these line up with what's visible in the application.

joseph-stout commented 2 weeks ago

Yes! I just noticed why the seattle_building_outlines_2022 looked off, not sure why it didn't occur to me before that the year field is provided in the footprint download and a duplicate footprint was generated for each building for each year on record, which is why there were so many overlaps on that file relative to seattle_buildings_2022_update_mr_20240425_v21. this is the seattle_buildings_2022 table as it came via carto image This is the seattle_buildings_2022_update_mr_20240425_v21 table view image

I did use the query to filter out the ones Mike flagged for the initial footprint query when I downloaded the seattle_building_outlines_2022 file as well, so rather than 27,000+ I am guessing it would be north of 30,000 with those included.

danrademacher commented 2 weeks ago

Ah, got it. This makes sense now. The weird thing about CARTO is that a "table" can also be "a view of a particualr query in the SQL editor"

I suspect that the query in this comment was in the custom SQL editor in CARTO and that indeed makes a row for every building for every year. I briefly thought we might need to do that, but we didn't need it -- I probably didn't clear the query out of the editor and it's easy to miss that a query is affecting the output.

But that clears it up and the data you used seattle_buildings_2022_update_mr_20240425_v21 is indeed the right stuff.

joseph-stout commented 2 weeks ago

Ah yes that makes sense, glad the dataset will work. I have finished going through the overlaps table and cleaned up slivers/fragmented polygons.

danrademacher commented 2 weeks ago

For sharing with Mike:

  1. CARTO map showing overlaps in shades of red based on % overlap: https://cityenergy-seattle.carto.com/builder/99ecd4cd-340b-4a0d-a485-6601747dea1c/embed
  2. Excel file saved to Drive with the overlaps: https://docs.google.com/spreadsheets/d/12lx-0MOmYFDnQDeLz87yhyexZhHUu1Ru?rtpof=true&usp=drive_fs
danrademacher commented 2 weeks ago

Ok, shipped this off to the client. We'll open a new implementation/fix issue once he reports back.

danrademacher commented 6 days ago

Questions from client:

danrademacher commented 6 days ago

Just replace the table in this folder: https://drive.google.com/drive/folders/1WTkAe5bo9G2efz6XXZcFzUVGvXUyP4Om