Closed derekeder closed 2 years ago
Possible data sources
I took a look at OSM data and chatted with a friend about data sources here. The OSM data wasn't too bad to parse, but the numbers seemed overall quite low. Fortunately, Microsoft released a building footprints dataset that seems more reasonable, and is still OS.
For sanity, here's a screenie of Delaware, one of the places where the two diverge the most:
If that data seems reasonable to folks (I think OK on my end), I've got it output in an outer tagged JSON format like the other data so far, and also an OSM script for posterity.
@nofurtherinformation nice work! The Microsoft data seems like a good find! From the Readme, it looks like they'd like to import it to OSM, but its a bit of a process.
A few follow-up questions:
Thanks @derekeder! Hopefully in the future it gets integrated to OSM, but I agree for now Microsoft data seems to be the way.
For sanity checks, the correlation with OSM is already pretty good (~75% correlation, r2 of .56), and comparing against population is also straightforward. There are a bunch of metro counties Assessor data we can compare against (Cook, LA, NY, etc.), and Vermont publishes their State E911 footprint address data, so we can get some regional/rural confirmation too. I suspect Microsoft might over count slightly for things like sheds or accessory units. I'll do a quick sanity check on the counts and circle back here.
For building type, this is slightly trickier. The Microsoft data is just footprint geometries, but we could combine those with zoning data where available to make a decent guess! Places like Texas will be a challenge, since zoning doesn't strictly exist. Same issues exist for OSM coverage on land use/zoning, but we'd get enough buildings that I think it'd be reasonable representative :)
@nofurtherinformation nice work! Let's roll with the Microsoft data then.
Combining with zoning is a good idea, but as you say, it would vary widely from state to state and as far as i know the data doesn't exist in one combined place like the US Buildings Footprints. It strikes me knowing the number of houses, apartments and other buildings in America is something that a lot of people would want/need to know. I wonder what existing research groups have already done estimates of this.
Thanks @derekeder! I've got some time tonight to sanity check the Microsoft data, so it should be good to go for tomorrow evening's meetup.
On building types, I did a bit more digging and found this report from the National Renewable Energy Lab. This goes a bit further down the rabbit hole on modeling (eg. estimating wall types), but they do critically provide by-building metadata files with state tags and building typologies, for commercial these look like:
'SmallOffice', 'RetailStripmall', 'RetailStandalone', 'Warehouse', 'QuickServiceRestaurant', 'Outpatient', 'MediumOffice', 'FullServiceRestaurant', 'SecondarySchool', 'LargeHotel', 'PrimarySchool', 'Hospital', 'SmallHotel', 'LargeOffice'
These are based on 2018 building stocks, so reasonably recent enough. For future reference, here's a link to the ResStock (residential stock) metadata tsv and to the ComStock (commercial stock) metadata tsv. Even if the total building count is different, this is probably a reasonable sample to go on and use as a percentage split on the Microsoft footprints!
Even more interesting, the NERL data gives us by building data on things like HVAC system types, energy consumption across a variety of metrics (heating and cooling, lighting) and seasonal estimates. Food for thought--here's a sample row of ComStock data:
bldg_id | 105 |
---|---|
applicability | True |
in.upgrade_name | Baseline |
in.tstat_clg_delta_f | 5 |
in.tstat_clg_sp_f | 77 |
in.tstat_htg_delta_f | 8 |
in.tstat_htg_sp_f | 63 |
in.aspect_ratio | 2 |
in.building_subtype | |
in.county | G0100890 |
in.building_type | SmallOffice |
in.rotation | 270 |
in.number_of_stories | 1 |
in.sqft | 17500 |
in.hvac_system_type | PSZ-AC with electric coil |
in.weekday_operating_hours | 8.5 |
in.weekday_opening_time | 8 |
in.weekend_operating_hours | 12.5 |
in.weekend_opening_time | 11 |
in.energy_code_followed_during_last_exterior_lighting_replaceme | ComStock DOE Ref 1980-2004 |
in.energy_code_followed_during_last_hvac_replacement | ComStock DOE Ref 1980-2004 |
in.energy_code_followed_during_last_interior_equipment_replacem | ComStock DOE Ref 1980-2004 |
in.energy_code_followed_during_last_interior_lighting_replaceme | ComStock 90.1-2019 |
in.energy_code_followed_during_last_roof_replacement | ComStock DOE Ref 1980-2004 |
in.energy_code_followed_during_last_service_water_heating_repla | ComStock DOE Ref 1980-2004 |
in.energy_code_followed_during_last_walls_replacement | ComStock DOE Ref 1980-2004 |
in.energy_code_followed_during_last_windows_replacement | ComStock 90.1-2007 |
in.energy_code_followed_during_original_building_construction | ComStock DOE Ref 1980-2004 |
in.heating_fuel | Electricity |
in.number_stories | 1 |
in.service_water_heating_fuel | Electricity |
stat.air_system_fan_total_efficiency | 0 |
stat.average_boiler_efficiency | 0 |
stat.average_dx_cooling_cop | 3.46235472864205 |
stat.average_dx_heating_cop | 0 |
stat.average_gas_coil_efficiency | 0 |
stat.design_dx_cooling_cop | 3.07743026390591 |
stat.design_dx_heating_cop | 0 |
stat.occupant_density_ppl_per_m_2 | 0.053819552083549 |
qoi_report.maximum_daily_timing_shoulder_hour | 9.0873786407767 |
qoi_report.maximum_daily_timing_summer_hour | 9.84496124031008 |
qoi_report.maximum_daily_timing_winter_hour | 9.45112781954887 |
qoi_report.maximum_daily_use_shoulder_kw | 23.16081538288 |
qoi_report.maximum_daily_use_summer_kw | 29.7602716030508 |
qoi_report.maximum_daily_use_winter_kw | 29.5302216031377 |
qoi_report.minimum_daily_use_shoulder_kw | 11.7128145227106 |
qoi_report.minimum_daily_use_summer_kw | 11.6088448793368 |
qoi_report.minimum_daily_use_winter_kw | 12.7324554597747 |
in.nhgis_tract_gisjoin | G0100890003000 |
in.nhgis_county_gisjoin | G0100890 |
in.state_name | Alabama |
in.state_abbreviation | AL |
in.census_division_name | East South Central |
in.census_region_name | South |
in.weather_file_2018 | USA_AL_Huntsville.Madison.723230_2018.epw |
in.weather_file_TMY3 | Huntsville_Intl_Jones_Field |
in.climate_zone_building_america | Mixed-Humid |
in.climate_zone_ashrae_2004 | 3A |
in.iso_region | None |
in.reeds_balancing_area | 89 |
in.resstock_county_id | AL, Madison County |
in.nhgis_puma_gisjoin | G01000302 |
out.district_cooling.cooling.energy_consumption | 0 |
out.district_cooling.cooling.energy_consumption_intensity | 0 |
out.district_heating.heating.energy_consumption | 0 |
out.district_heating.heating.energy_consumption_intensity | 0 |
out.district_heating.water_systems.energy_consumption | 0 |
out.district_heating.water_systems.energy_consumption_intensity | 0 |
out.electricity.cooling.energy_consumption | 7383.33333333333 |
out.electricity.cooling.energy_consumption_intensity | 0.421904761904762 |
out.electricity.exterior_lighting.energy_consumption | 18058.3333333333 |
out.electricity.exterior_lighting.energy_consumption_intensity | 1.03190476190476 |
out.electricity.fans.energy_consumption | 10025 |
out.electricity.fans.energy_consumption_intensity | 0.572857142857143 |
out.electricity.heat_recovery.energy_consumption | 0 |
out.electricity.heat_recovery.energy_consumption_intensity | 0 |
out.electricity.heat_rejection.energy_consumption | 0 |
out.electricity.heat_rejection.energy_consumption_intensity | 0 |
out.electricity.heating.energy_consumption | 7308.33333333333 |
out.electricity.heating.energy_consumption_intensity | 0.417619047619048 |
out.electricity.interior_equipment.energy_consumption | 94127.7777777778 |
out.electricity.interior_equipment.energy_consumption_intensity | 5.37873015873016 |
out.electricity.interior_lighting.energy_consumption | 14847.2222222222 |
out.electricity.interior_lighting.energy_consumption_intensity | 0.848412698412698 |
out.electricity.pumps.energy_consumption | 2.77777777777778 |
out.electricity.pumps.energy_consumption_intensity | 0.00015873015873 |
out.electricity.refrigeration.energy_consumption | 0 |
out.electricity.refrigeration.energy_consumption_intensity | 0 |
out.electricity.water_systems.energy_consumption | 5552.77777777778 |
out.electricity.water_systems.energy_consumption_intensity | 0.317301587301587 |
out.natural_gas.heating.energy_consumption | 0 |
out.natural_gas.heating.energy_consumption_intensity | 0 |
out.natural_gas.interior_equipment.energy_consumption | 0 |
out.natural_gas.interior_equipment.energy_consumption_intensity | 0 |
out.natural_gas.water_systems.energy_consumption | 0 |
out.natural_gas.water_systems.energy_consumption_intensity | 0 |
out.other_fuel.heating.energy_consumption | 0 |
out.other_fuel.heating.energy_consumption_intensity | 0 |
out.other_fuel.water_systems.energy_consumption | 0 |
out.other_fuel.water_systems.energy_consumption_intensity | 0 |
out.district_cooling.total.energy_consumption | 0 |
out.district_cooling.total.energy_consumption_intensity | 0 |
out.district_heating.total.energy_consumption | 0 |
out.district_heating.total.energy_consumption_intensity | 0 |
out.electricity.total.energy_consumption | 157305.555555556 |
out.electricity.total.energy_consumption_intensity | 8.98888888888889 |
out.site_energy.total.energy_consumption | 157305.555463116 |
out.site_energy.total.energy_consumption_intensity | 8.9888888836066 |
out.natural_gas.total.energy_consumption | 0 |
out.natural_gas.total.energy_consumption_intensity | 0 |
out.other_fuel.total.energy_consumption | 0 |
out.other_fuel.total.energy_consumption_intensity | 0 |
upgrade | 0 |
weight | 7.04174071967379 |
metadata_index | 0 |
@nofurtherinformation whoa that is some crazy detail! I'd be curious to see what the coverage is for the in.heating_fuel
and in.service_water_heating_fuel
attributes and what the distribution of values are. if we have good coverage, we could get very precise on how many buildings still need to be electrified!
Heya @derekeder, agreed, really detailed! I pulled some summary data by county (we can shift up to state, but may be useful to explore and get a sense of the data). I plopped together a quick explorer with four pages:
Check it out here -- note the color bins on the maps are not fixed between residential and commercial, but this will help to get a sense of the distribution.
For sanity checking data, here's what we've got for counts of buildings via government footprint data vs Microsoft data:
Place | Gov | Microsoft |
---|---|---|
VERMONT | 419,331 | 351,266 |
CHICAGO | 820,606 | 907,967 |
LA | 1,122,422 | 1,33,9971 |
Taking these 3 cases it's not perfect, but within around 10% or so. I'd be inclined to suggest this is good enough if you feel comfortable, given uncertain on both Microsoft data and open government data, but we can also pull some other county locations to confirm assumptions here.
@nofurtherinformation thanks for this. This is really great! I agree its close enough that we should proceed with it. do you want to take on getting a CSV of the data rolled up by state with the relevant columns of data?
Definitely! Just filed a PR that compiles all this. For convenience, here's a link to the google sheet and a direct CSV link!
For the Building electrification part of the state detail page, we will need to get counts of the number of buildings in a given state.
Research where to find this data - bonus points for breaking the building counts up by residential and non-residential.
Data should be collected in a Google Sheet with one row for each State