Breakthrough-Energy / PreREISE

Generate input data for scenario framework
https://breakthrough-energy.github.io/docs/
MIT License
21 stars 28 forks source link

Build distribution of demand to HIFLD buses #229

Closed danielolsen closed 2 years ago

danielolsen commented 3 years ago

:rocket:

Describe the workflow you want to enable

I wish there was a function that populated the Pd field for the bus table for the HIFLD grid.

Describe your proposed implementation

The new function should either live in prereise.gather.griddata.hifld.data_process.transmission or in a new module in a similar location (prereise.gather.griddata.hifld.data_process.demand?). One route could be demand proportional to population, which could be obtained from the census or similar sources.

danielolsen commented 3 years ago

The Census Bureau has released some information on the 2020 census (Redistricting Data (PL 94-171)), but does not appear to have released the 'full' data yet (i.e. what was called Summary File 1 in the 2010 census). It seems like more detailed data will come out sometime in 2022, according to their blog post from last week: https://www.census.gov/newsroom/blogs/random-samplings/2021/09/upcoming-2020-census-data-products.html

Digging into trying to query population data directly from the U.S. Census Bureau, I've found several leads on getting to ZIP-code level population data, but no slam-dumks yet:

As a reminder, simplemaps.com also provides population estimates per-ZIP and per-county, but the process by which they got to these values is a bit opaque.

EDIT: Census redistricting data from 2020 can give us populations by county directly, but ideally we would like something more granular, since counties are often large areas with varying population density, and distributing the demand naively across all substations in a county will probably produce bad results.

danielolsen commented 3 years ago

The 2019 American Community Survey can give us population by PUMA, which would still be ~15-20x larger than ZIP code on average (about 2,400 PUMAs vs. about 42,000 ZIP codes), and we would need to use the lat/lon of the substations to deduce PUMA rather that taking the ZIP code directly.

It seems that at least internally, the Census Bureau has a mapping of 2020 Census blocks to 2020 ZCTAs (see https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.2020.html), and we may be able to use geopandas or some similar tool to be able to deduce this mapping and therefore sum the available 2020 block data to ZCTAs, but it would be really great if we didn't have to do that manually.

danielolsen commented 2 years ago

Closed by #235.