GreenInfo-Network / seattle-building-dashboard

Energy benchmarking for Seattle
https://greeninfo-network.github.io/seattle-building-dashboard/
ISC License
1 stars 0 forks source link

Add missing building outlines #1

Closed danrademacher closed 2 years ago

danrademacher commented 2 years ago

We have handy instructions from Stamen on how we can pull outline-less buildings from CARTO: https://github.com/GreenInfo-Network/seattle-building-dashboard/blob/master/docs/Building_outlines.md

Though Terry is also planning to send us a list.

Then we can use this 2019-2020 vintage data from Microsoft to find outlines for the missing buildings.

If we still have missing outlines, we'll want to see how many and whether we manually review or take some other approach. Some would likely be due to wrong locations of latlng points, while others could siply be missing outlines in the MS data.

┆Issue is synchronized with this Asana task

danrademacher commented 2 years ago

Tim did some initial investigation of this and pulled OSM and Microsoft buildings.

then the client just sent a list of the buildings missing outlines. There are 523 of them

All that is in this folder: https://drive.google.com/drive/folders/175VvCu3i6Vrp7ySnjGttTIEOauCq-bDp

danrademacher commented 2 years ago

cc @tsinn

This is an interesting comment from the client:

Most of these buildings seem to have a shape in the OSM data somewhere they just weren’t matched using the lat/long point to polygon match originally. Here’s an example in Carto image image

So this is a case where the actual data they already have in CARTO contains building outlines that never got tied together.

Maybe we should start with the points and polygons they already have and just see how many Stamen just missed.

The queries and comments in this file are important to review: https://github.com/GreenInfo-Network/seattle-building-dashboard/blob/master/docs/Building_outlines.md

Especially at the end:

Buildings that still don't have outlines fall into three types:

  1. Building point is in the wrong place and is too far from an outline to be found. For example, id=388 (Rainier Tower) seems to clearly be in the wrong place.
  2. Building point is in or near an outline already in use. For example, id=352 (Abraham Lincoln Building) overtaken by id=295 but they seem to be in the same building.
  3. Building point is accurate but building outline doesn't exist.
danrademacher commented 2 years ago

It looks like there are contradictions between the outlines data in CARTO, current OSM, and Microsoft.

As shown here where purple is CARTO current and brown is OSM: image

OSM appears more accurate. We assume purple came form 2009 Seattle data. This points to more complex data QA than we expected. In that example, one of the two items has an OSM ID, and it was deleted 4 years ago: https://www.openstreetmap.org/way/137152318

Breaking this down:

  1. Intersect the 523 with current OSM, and see what we get for both successful joins and points without buildings
  2. Intersect those without buildings from 1 with the Microsoft layer and see what's left
  3. For any remaining unassigned points, we would re-geocode the address and then rerun the joins in 1 and 2.
  4. Then see what's left and whether we can or should hand review.

Other considerations:

One question that came up for Terry is why we have 281K builing outlines, only 3600 of which have a buildingid but then we test for that ID being null as a sign of no ID.

danrademacher commented 2 years ago

Looking closer at the query Terry suggested,

SELECT osebuildingid
  ,buildingname
  ,firstyearrequired
  ,taxparcelidentificationnumber
  ,fulladdress
  ,latitude
  ,longitude
  ,seattle_building_outlines_20181126.osm_id
  ,seattle_building_outlines_20181126.buildingid
FROM PUBLIC.table_2020_required_buildings
LEFT JOIN PUBLIC.seattle_building_outlines_20181126 ON table_2020_required_buildings.osebuildingid = PUBLIC.seattle_building_outlines_20181126.buildingid
WHERE osm_id IS NULL
ORDER BY osebuildingid

This query depends on finding matches between buildingid and also that osm_id is null as a sign that we have no polygon connection. In reviewing the data this morning, we noted that buildingid is populated for only 1.3% of the building outline data. This was unexpected. Also, osm_id is missing for about 75K buildings. since some buildings came from 2009 Seattle data, this might be expected, but it still makes us a bit confused about using these IDs to figure out missing polygons.

In any case, we're going to proceed with out Steps 1, 2, and 3 above. If we are able to get matches from the latest OSM and Microsoft data, then eventually we might just flush unconnected polygons out of CARTO so they aren't sitting around causing confusion in the future.

SabaTG commented 2 years ago

the total number of unmatched/missing buildings are 523, this first analysis was done against the OSM data and the city parcels downloaded from here

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | missing buildings | type of match | review -- | -- | -- | -- 1 | 457 | matched with buildings | since there is a parcel id attached to the missing buildings data, to be sure the geocoded points are in the right place I'll crosscheck if the parcels they're in match those ids 2 | 52 | landed within a parcel but didn't match a specific building | most of the reviewed parcels do have buildings but in some there are more than one and in others the points landed on the open space missing the building. It could also be a case of missing building outline in OSM, for these there will be some manual moving of points involved. I'll use google street view to verify the address 3 | 14 | didn't match a building or a parcel | since all the addresses are not correct after geocoding, 1 landed in Nebraska(1959 NE PACIFIC ST, ,) and 2 out of city borders. There are 3 duplicates (have the same address, could be a multi floor building), 1 has just the street name which makes it hard to pair with a building but will use the parcel id to find were that building could be. The remaining 7 land on the streets, they'll need a hand review by address to pair with a building

danrademacher commented 2 years ago

@SabaTG one question about the 457 that matched with buildings -- is that all based on just OSM or some mix of OSM and Microsoft data? If the latter, would be interesting to know how many from each source, since it seems likely that Microsoft data might be lower quality in many cases.

SabaTG commented 2 years ago

I only used OSM data for these matches. I didn't use Microsoft data since it seemed to cover larger area than the actual buildings and that could give us false matches with the buildings. I'm thinking of using Microsoft data at the end for those buildings that might be missing in OSM.

danrademacher commented 2 years ago

Excellent! That is a fantastic result just from OSM!

SabaTG commented 2 years ago
SabaTG commented 2 years ago

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

original number of missing buildings geocoded | moving on matching parcel id | those that didn't mive at all | compare between parcel id in both layers | last notes -- | -- | -- | -- | -- 457 matched a building outline | 412 moved | 45 didn't move (possibly because they din't match any parcel id or they were already geocoded to the center of the parcel polygon) | 56 records parcel_id don't match the parcel they are geocoded in, the 11 were moved out of the parcel boundaries because of multipart parcels. Which we'll have to move back manually | This means we can be sure the 401 geocoded to a building are within the right parcel. The 45 missing need more review to determine if its a missing building that needs to be added or a parcel outside the city boundaries or wron entry of the parcel id 52 matched a parcel | 48 moved | 4 didn't move | 4 parcelid don't match the parcel they are geocoded in | 44 are definitly within the right parcel and the other 4 need more review 14 didn't match any parcel | 12 moved | 2 are outside the city boundary, so they couldn't get matched |   | 10 are moved into the right parcel and the other 2 are outside the city boundary

SabaTG commented 2 years ago

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

pnts_not_within Could not find point with point id 1180001715 Could not find point with point id 2895800030 pnts_within_blds Could not find point with point id 7137830040 Could not find point with point id 5342900030 Could not find point with point id 2937600000 Could not find point with point id 5017300000 Could not find point with point id 1562700010 Could not find point with point id 6065000020 Could not find point with point id 8809700040 Could not find point with point id 4458720010 Could not find point with point id 1197000000 Could not find point with point id 0000000000 Could not find point with point id 3374400010 Could not find point with point id 3374400020 Could not find point with point id 7683890010 Could not find point with point id 952004280 Could not find point with point id 2301650000 Could not find point with point id 0925049436 Could not find point with point id 8647700000 Could not find point with point id 5461700000 Could not find point with point id 3589000000 Could not find point with point id 7683890020 Could not find point with point id 5160900010 Could not find point with point id 695000225 Could not find point with point id 660001195 Could not find point with point id 8647700000 Could not find point with point id 1926049445 Could not find point with point id 7666201146 Could not find point with point id 2302200000 Could not find point with point id 6390080000 Could not find point with point id 5699300000 Could not find point with point id 8601700000 Could not find point with point id 7697990000 Could not find point with point id 2268700000 Could not find point with point id 4468600000 Could not find point with point id 3240700000 Could not find point with point id 8847550000 Could not find point with point id 2325039109 Could not find point with point id 4129320010 Could not find point with point id 4129300020 Could not find point with point id 4129320010 Could not find point with point id 0656000141 Could not find point with point id 9134200000 Could not find point with point id 1988201305 Could not find point with point id 255150000 Could not find point with point id 4303500000 Could not find point with point id 0000000000 pnts_within_prcls Could not find point with point id 7942040000 Could not find point with point id 6817880000 Could not find point with point id 2771102201 Could not find point with point id 8590900291

SabaTG commented 2 years ago
danrademacher commented 2 years ago

I just opened P:\proj_p_s\Seattle Building Dashboard\final\SBD.gdb in QGIS and it is showing the final data as a point layer: image

Is that expected? I was expecting polygons of buildings there.

SabaTG commented 2 years ago

I thought the final output was a point file. If not, I can intersect with the osm polygon and create a polygon layer with building outlines.

SabaTG commented 2 years ago

Just extracted them here P:\proj_p_s\Seattle Building Dashboard\final\SBD.gdb\osm_building_outlines

danrademacher commented 2 years ago

OK, thanks for those outlines!

the final output is the outlines themselves, to support the transition in the tool from points to polygons like in this animation: https://recordit.co/39bAvz46BS

SabaTG commented 2 years ago
SabaTG commented 2 years ago

Another thing to note is that there are duplicate building outlines for points that landed on the same polygon. since the osm data has one polygon for some neighboring buildings as shown below. These are some of the osm building outlines we've flagged as needing review. It is possible we might have missed some. image In this example its hard to determine where one starts and the other ends. three points landed on the building image image

danrademacher commented 2 years ago

I see there are 8 that you've noted as "needs outline review" image

I went ahead and added this data to CARTO, since it is very good and might be good enough for the client based on what they told me about level of effort they want to do here. I'll let them know about these 8 though and see what they want to do

For the record, this is what I did:

  1. I selected and downloaded all the unassigned buildings from here:

    select * FROM public.seattle_building_outlines_20181126 where buildingid is null

    and saved to Drive here

  2. I deleted those from CARTO so we don't have ongoing mess clogging that table with 278,000 unassigned buildings:

    delete FROM public.seattle_building_outlines_20181126 where buildingid is null
  3. I exported the final_building_outline_SJ as GeoJSON and uploaded that to CARTO at https://cityenergy-seattle.carto.com/dataset/table_2022_gin_buildingoutlines.

I inserted those into the existing building table:

insert into public.seattle_building_outlines_20181126 (the_geom, buildingid, osm_id)
select table_2022_gin_buildingoutlines.the_geom, user_osebuildingid, table_2022_gin_buildingoutlines.osm_id
from table_2022_gin_buildingoutlines
where user_osebuildingid is not null

In theory, that should result in just 1 building without an outline. But since some of our buildings are not from OSM, I modified Terry's original query to test on presence of a cartodb_id in teh joined outlines table, since those are never null if a join takes place. Here's that query:

SELECT osebuildingid
  ,buildingname
  ,firstyearrequired
  ,taxparcelidentificationnumber
  ,fulladdress
  ,latitude
  ,longitude
  ,seattle_building_outlines_20181126.osm_id
  ,seattle_building_outlines_20181126.buildingid
FROM PUBLIC.table_2020_required_buildings
LEFT JOIN PUBLIC.seattle_building_outlines_20181126 ON table_2020_required_buildings.osebuildingid = PUBLIC.seattle_building_outlines_20181126.buildingid
where PUBLIC.seattle_building_outlines_20181126.cartodb_id is null
ORDER BY osebuildingid

And the result is the one building the port where we can't tell what is going on: image

danrademacher commented 2 years ago

I made links for the suspect buildings that I'll send to Terry:

https://www.seattle.gov/energybenchmarkingmap/#seattle/2019?zoom=18&building=690 https://www.seattle.gov/energybenchmarkingmap/#seattle/2019?zoom=18&building=50484 https://www.seattle.gov/energybenchmarkingmap/#seattle/2019?zoom=18&building=50356 https://www.seattle.gov/energybenchmarkingmap/#seattle/2019?zoom=18&building=50654 https://www.seattle.gov/energybenchmarkingmap/#seattle/2019?zoom=18&building=50246 https://www.seattle.gov/energybenchmarkingmap/#seattle/2019?zoom=18&building=50642 https://www.seattle.gov/energybenchmarkingmap/#seattle/2019?zoom=18&building=50467 https://www.seattle.gov/energybenchmarkingmap/#seattle/2019?zoom=18&building=50333

seattle-benchmarking commented 2 years ago

The Port building is fine to leave as a point since there appears to be a bunch of small buildings combined in one account.

For the suspect building list - I think they are close enough. Not all the links worked for me - some of the newer ones with IDs starting with 504XX or 506XX. But if they behave like the ID 690 or ID 50246 then I fine to proceed.

danrademacher commented 2 years ago

Ah, interesting. Those three buildings are new in 2020, so they are only present in your 2020 buildings table:

image