GreenInfo-Network / seattle-building-dashboard

Energy benchmarking for Seattle
https://greeninfo-network.github.io/seattle-building-dashboard/
ISC License
1 stars 0 forks source link

Update City Council #96

Closed danrademacher closed 5 months ago

danrademacher commented 5 months ago

They have the wrong City Council Districts in the data, and we're trying to quickly patch the data.

danrademacher commented 5 months ago

Info here: https://stackoverflow.com/questions/1746125/update-columns-values-with-column-of-another-table-based-on-condition

CSV of this table: https://docs.google.com/spreadsheets/d/1FCJQ3Ms8WNRVz6Wu77pwIrMujm3leuJ0/edit?pli=1#gid=1021466727

Uploaded here (but will delete after task complete): https://cityenergy-seattle.carto.com/dataset/propertyccupdates_2014_xlsx_carto_import

I duplicated a copy of the live data as is in case things go sideways https://cityenergy-seattle.carto.com/dataset/seattle_buildings_2022_update_copy

This seems correct but it exceeds CARTO's API limit:

UPDATE public.seattle_buildings_2022_update
   SET councildistrict=(SELECT _2014_c_district FROM propertyccupdates_2014_xlsx_carto_import WHERE public.seattle_buildings_2022_update.id =propertyccupdates_2014_xlsx_carto_import.buildingid)

It's the subquery that's the problem. This worked fine:

UPDATE public.seattle_buildings_2022_update
   SET councildistrict=_2014_c_district
FROM propertyccupdates_2014_xlsx_carto_import where public.seattle_buildings_2022_update.id=propertyccupdates_2014_xlsx_carto_import.buildingid

image

But now why are some null?

There are 10 with councildistrict as null

danrademacher commented 5 months ago

Looks like it is these two buildings:

29069
49911 

I'll ask Mike about those

danrademacher commented 5 months ago

Confirmed from Mike we can delete those, so

DELETE FROM public.seattle_buildings_2022_update where id in (29069,49911)

For the record:

29069 – The Lakeshore is a property we just discovered that was benchmarking in unincorporated King County down near Renton. This is a weird case where they came up in our records as having a Seattle address, but they definitely are outside of Seattle city limits. We can remove it from the dataset.

49911 – Same thing, this building is just outside of city limits in unincorporated King County. They were on our list for benchmarking, but were not actually required to do so. You can remove it from the dataset.

danrademacher commented 5 months ago

Null is no longer an option:

image
danrademacher commented 1 month ago

Looking back, I believe this was an issue in the source data from Seattle. But today we had a similar issue come up but it seems like the cause is different.

Including notes here for future if it comes up again:

For whatever reason, I’m seeing the earlier issue with the map where it is showing a bunch of properties as erroneously being in council district 1. Can you figure out how this might have happened?

The only time we recently edited the data was when we fixed the percentage calculations a few weeks ago, but that dataset on the Sharepoint appears to not have this issue. Is this possibly something that is happening within Carto? Can we recover the old CC districts from a prior upload?

@tomay findings:

I just uploaded the Excel file from sharepoint (presumably the same one we worked with three weeks ago, and uploaded to CARTO), and it has 0 nulls. Weird.

that is this file https://seattlegov.sharepoint.com/:x:/r/sites/OSE-External/Shared%20Documents/Benchmarking/GreenInfo%20OSE%202023/CARTO%20files/2022%20Carto/seattle_buildings_2022_update%20MR%2020240502.csv?d=w9e6ab26685c04477852aeb2acfbfc329&csf=1&web=1&e=NrV6aZ

Sharepoint links are notoriously unstable. This is the path:

image

So this doesn't seem like a source data issue, but also supposedly it was working when we updated the data a couple of weeks ago, and then it stopped working?

A mystery.

Tom used the re-uploaded data to join back in the correct council districts.

If this happens again we'll have to really dig in and find out what the heck is going on!

danrademacher commented 1 month ago

This is the query that time:

UPDATE public.seattle_buildings_2022_update
   SET seattle_buildings_2022_update.councildistrict=councildistrict
FROM seattle_buildings_2022_update_mr_20240425_v21 where public.seattle_buildings_2022_update.id=seattle_buildings_2022_update_mr_20240425_v21.id