Closed danrademacher closed 4 months ago
For the overlap analysis, we need to do the following:
~One critical wrinkle here is that footprints that overlap in the footprint table only cause problems in the applicaiton if they also overlap in time~
~I wonder if we should write a Postgres query in CARTO that joins the years of data available to the footprints, with a separate outline for every year a building is used. then we do multiple overlap assessments within each year. That's a PITA but it seems like a better diagnostic.~
Footprints: https://cityenergy-seattle.carto.com/dataset/seattle_building_outlines_2022 Table: https://cityenergy-seattle.carto.com/dataset/seattle_buildings_2022_update_mr_20240425_v21
This query seems right:
SELECT public.seattle_building_outlines_2022.*, seattle_buildings_2022_update_mr_20240425_v21.year FROM public.seattle_building_outlines_2022 join seattle_buildings_2022_update_mr_20240425_v21 ON buildingid = id order by buildingid, year
The tedious part will be running the overlap for each year alone, then reporting out the building IDs that error out in each year, then dedupe that into a single list of problem buildings.
I think we should do this before we do #99
Latest footprints from CARTO are here:
P:\proj_p_s\Seattle Building Dashboard\2024_update
The ratio of gross square footage to footprint area task:
reported_gross_floor_area
from the data at https://cityenergy-seattle.carto.com/dataset/seattle_buildings_2022_update.reported_gross_floor_area/footprint_area
For the overlap, in talking it through with Joseph, we realized that the year/overlap piece is a red herring since building prints are pulled only by ID and not year, so there would never be a case where a single ID would pull multiple iterations of a footprint over time.
That simplifies this task and we can just look at the full footprint dataset irrespective of year.
The overlap check needs to identify the following:
We should exclude from our results all the buildigng IDs already slated for correction, as noted in #99.
So we could pull from CARTO and run this query:
SELECT * FROM public.seattle_building_outlines_2022 where buildingid not in (41,43,73,84,87,114,125,144,168,172,175,185,190,194,198,205,212,215,217,225,226,227,234,245,247,248,249,266,268,281,293,302,315,325,326,345,350,351,355,358,364,381,422,435,467,477,496,500,516,576,608,658,683,689,690,694,708,728,730,819,1281,19489,19510,19584,19694,19696,19704,19754,19777,19780,19788,19831,19847,19848,19849,19873,19925,19947,19990,19992,20027,20028,20045,20047,20052,20056,20082,20095,20100,20140,20187,20216,20217,20234,20241,20242,20321,20340,20367,20369,20456,20490,20522,20523,20526,20543,20585,20659,20686,20720,20762,20805,20810,20888,20930,21117,21122,21150,21196,21202,21215,21216,21218,21231,21310,21314,21335,21341,21382,21399,21401,21425,21432,21476,21533,21556,21557,21570,21604,21610,21616,21627,21692,21697,21745,21811,21941,22259,22261,22264,22299,22334,22504,22511,22607,22654,22739,22742,22801,22810,22863,22939,22941,22954,22964,23020,23028,23050,23074,23076,23173,23242,23265,23322,23323,23327,23400,23409,23484,23495,23496,23503,23554,23605,23615,23622,23632,23637,23639,23714,23730,23752,23846,23937,23956,24030,24036,24086,24150,24160,24161,24162,24179,24235,24248,24253,24306,24489,24494,24495,24496,24560,24562,24587,24599,24617,24645,24659,24750,24775,24854,24857,24934,24987,25053,25054,25129,25131,25241,25245,25251,25299,25360,25371,25410,25435,25527,25543,25652,25710,25761,25973,26007,26026,26040,26098,26150,26198,26283,26313,26450,26460,26558,26574,26729,26746,26748,26808,26821,26862,26880,27008,27071,27231,27241,27265,27300,27342,27343,27348,27389,27597,27600,27774,27799,27833,27888,27901,27903,27955,27970,27981,28006,28020,28025,28874,29069,29170,29310,29312,29550,29612,29792,30207,31187,32454,33527,37008,37248,39332,40388,40447,41068,41070,45287,48287,49699,49703,49704,49705,49709,49721,49722,49731,49735,49736,49737,49738,49771,49792,49797,49821,49825,49862,49889,49894,49911,49943,49945,49946,49967,49990,49996,49998,50002,50003,50005,50009,50023,50038,50049,50055,50067,50070,50101,50160,50175,50177,50180,50194,50245,50247,50259,50294,50309,50347,50376,50383,50385,50415,50416,50462,50481,50524,50642,50656,50693,50694,50704,50705,50710,50718,20066,103,400)
Run that on https://cityenergy-seattle.carto.com/dataset/seattle_building_outlines_2022
Logging in with this 1P record: https://start.1password.com/open/i?a=AKLRKLJLLJEJ3HURJAVPAOETS4&v=a36tymynaiooq56tfqfvnkll4y&i=l5usgmphlbgqpfk3i4htnvr3bm&h=greeninfonetwork.1password.com
Finding some instances with many stacked polygons of the same building outline. Find overlaps tool returned a number of polygons with up to 32 overlaps, which I suspect in many cases are duplicate building footprints that have been stacked as the case is in this instance. Building ID 29 has 8 identical polygons stacked.
Huh, that's annoying! I wonder if we could confidently find all instances like this -- where Building ID is the same and footprint is 100% the same and just delete all but on3. That's the end goal in a case like this -- one footprint for one ID.
Do you think that's possible?
I think that may be possible! Once I calculate the percentage of overlaps I think I could select those out and delete the duplicates fairly quickly. In the future maybe we could write a short query or script to identify these and delete them, but I am more curious how the duplicates are being generated.
Although-- I may have been mistaken to begin with and this may not be as big an issue as I thought. I just noticed I can generate a shapefile from seattle_buildings_2022_update_mr_20240425_v21 to download from carto and the footprints appear much cleaner compared to seattle_building_outlines_2022 (3,695 records vs. 27,564) and overlaps are minimal (126 records with a max count of 4, vs. 3828 records with max overlap counts of 32).
Okay, I think I have something, assuming seattle_buildings_2022_update_mr_20240425_v21 is the most accurate set of building footprints to go off off currently. Area calculations are in square feet.
The process:
Percentages ~ 100 indicate duplicates; percentages close to zero appear to be slivers. Percentages between indicate the range of partial overlaps. SampleFindings_SeattleBuildingsOverlaps_20240613.xlsx
Erased overlaps from primary footprint layer to get rid of duplicates, merged find overlaps output layer to primary footprint layer, to fill back in, and dissolved by building ID to get a single footprint for each building. Only two sliver-overlaps remaining, but the rest seems to have been cleaned up; The output file has slivers at the edges of some of the dissolved polygons that will need to be resolved.
Merged some obvious slivers (~30 records), but there remain some things like this should be cleaned up. The overlap has been removed but the building footprint (and ID's) are broken up into multiple parts:
An artifact of my earlier workflow is that it left fragments like this, with unique building ID's in a footprint. In other cases where building footprints touch they are part of a cluster of defined individual footprints, so it is not as straightforward to resolve. There are 3,337 records in the attribute table and manually merging where appropriate would take a while. I will keep mulling this over.
Aside from the question about how to handle the outcomes above, I realized the only places that should have been affected by the previous workflow are where the overlaps occurred (126 footprint locations). So I am going through these individually and resolving as needed, it is a much more manageable sample size.
Great progress.
Hmm, as far as using seattle_buildings_2022_update_mr_20240425_v21 as the data source, I need to look into that.
That's an active query that pulls form seattle_building_outlines_2022
but excludes 374 building IDs:
SELECT * FROM public.seattle_building_outlines_2022 where buildingid not in (41,43,73,84,87,114,125,144,168,172,175,185,190,194,198,205,212,215,217,225,226,227,234,245,247,248,249,266,268,281,293,302,315,325,326,345,350,351,355,358,364,381,422,435,467,477,496,500,516,576,608,658,683,689,690,694,708,728,730,819,1281,19489,19510,19584,19694,19696,19704,19754,19777,19780,19788,19831,19847,19848,19849,19873,19925,19947,19990,19992,20027,20028,20045,20047,20052,20056,20082,20095,20100,20140,20187,20216,20217,20234,20241,20242,20321,20340,20367,20369,20456,20490,20522,20523,20526,20543,20585,20659,20686,20720,20762,20805,20810,20888,20930,21117,21122,21150,21196,21202,21215,21216,21218,21231,21310,21314,21335,21341,21382,21399,21401,21425,21432,21476,21533,21556,21557,21570,21604,21610,21616,21627,21692,21697,21745,21811,21941,22259,22261,22264,22299,22334,22504,22511,22607,22654,22739,22742,22801,22810,22863,22939,22941,22954,22964,23020,23028,23050,23074,23076,23173,23242,23265,23322,23323,23327,23400,23409,23484,23495,23496,23503,23554,23605,23615,23622,23632,23637,23639,23714,23730,23752,23846,23937,23956,24030,24036,24086,24150,24160,24161,24162,24179,24235,24248,24253,24306,24489,24494,24495,24496,24560,24562,24587,24599,24617,24645,24659,24750,24775,24854,24857,24934,24987,25053,25054,25129,25131,25241,25245,25251,25299,25360,25371,25410,25435,25527,25543,25652,25710,25761,25973,26007,26026,26040,26098,26150,26198,26283,26313,26450,26460,26558,26574,26729,26746,26748,26808,26821,26862,26880,27008,27071,27231,27241,27265,27300,27342,27343,27348,27389,27597,27600,27774,27799,27833,27888,27901,27903,27955,27970,27981,28006,28020,28025,28874,29069,29170,29310,29312,29550,29612,29792,30207,31187,32454,33527,37008,37248,39332,40388,40447,41068,41070,45287,48287,49699,49703,49704,49705,49709,49721,49722,49731,49735,49736,49737,49738,49771,49792,49797,49821,49825,49862,49889,49894,49911,49943,49945,49946,49967,49990,49996,49998,50002,50003,50005,50009,50023,50038,50049,50055,50067,50070,50101,50160,50175,50177,50180,50194,50245,50247,50259,50294,50309,50347,50376,50383,50385,50415,50416,50462,50481,50524,50642,50656,50693,50694,50704,50705,50710,50718,20066,103,400)
I believe those are the buildings that Mike already flagged for custom/bespoke correction, so that's why we excluded them -- no need to review things he's already flagged for fixing.
But I can't quite figure out why there are 27K outlines in seattle_building_outlines_2022
. I looked at the tabular data and in there we have only 3692 buildings with energy data. That's this query in CARTO:
SELECT count(distinct(id)) FROM public.data_2022_update
So the totals in seattle_buildings_2022_update_mr_20240425_v21
make more sense. But what are the other 25K footprints?
Can you post a screenshot of all the buildings you analyzed? Just want to get a gestalt sense of how these line up with what's visible in the application.
Yes! I just noticed why the seattle_building_outlines_2022 looked off, not sure why it didn't occur to me before that the year field is provided in the footprint download and a duplicate footprint was generated for each building for each year on record, which is why there were so many overlaps on that file relative to seattle_buildings_2022_update_mr_20240425_v21. this is the seattle_buildings_2022 table as it came via carto This is the seattle_buildings_2022_update_mr_20240425_v21 table view
I did use the query to filter out the ones Mike flagged for the initial footprint query when I downloaded the seattle_building_outlines_2022 file as well, so rather than 27,000+ I am guessing it would be north of 30,000 with those included.
Ah, got it. This makes sense now. The weird thing about CARTO is that a "table" can also be "a view of a particualr query in the SQL editor"
I suspect that the query in this comment was in the custom SQL editor in CARTO and that indeed makes a row for every building for every year. I briefly thought we might need to do that, but we didn't need it -- I probably didn't clear the query out of the editor and it's easy to miss that a query is affecting the output.
But that clears it up and the data you used seattle_buildings_2022_update_mr_20240425_v21
is indeed the right stuff.
Ah yes that makes sense, glad the dataset will work. I have finished going through the overlaps table and cleaned up slivers/fragmented polygons.
For sharing with Mike:
Ok, shipped this off to the client. We'll open a new implementation/fix issue once he reports back.
Questions from client:
Just replace the table in this folder: https://drive.google.com/drive/folders/1WTkAe5bo9G2efz6XXZcFzUVGvXUyP4Om
The table has the list of IDs attached, but there are two rows with nulls.
They seem to be this stack of three:
Joseph to manually edit the sheet to fix those IDs
Updated overlaps table and replaced nulls with the ID's for the polygons stacked in the image above. Table should now be complete
OK, great. I confirmed this looks good and sent it off to Mike.
Also revised the CARTO map
So this is desktop GIS work to assess where buildings are overlapping and therefore presumed wrong, and also looking for outliers on the ratio of gross square footage from OSE data vs footprint area. For the second item, we will need to join the building energy data to footprints, since (a) footprint area can be calculated only by footprint but (b) gross square footage comes in the building energy data as
reported_gross_floor_area
┆Issue is synchronized with this Asana task ┆Due Date: 2024-06-14