DataKind-DC / rcp2

Red Cross!
MIT License
11 stars 24 forks source link

Gather Housing age/composition information for GEOIDs #36

Open thwhitfield opened 4 years ago

thwhitfield commented 4 years ago
Jaboola9 commented 4 years ago

I'll have a go on this ~ Ian

Jaboola9 commented 4 years ago

TL;DR: The only source found at the block group level, the ACS-5 was the most complete, granular provider of free information on housing stocks. The AHS and RECS data provide useful divisional features. The USPS Vacancy data was the most frequent (quarterly) so more relevant features may be found in trends (but only at the tract level). Two free sources of housing data that are not representative, but might still be useful were found.

American Community Survey 5-year Estimates (ACS-5) Frequency: 5 years Granularity: Block Group (600 - 3000 people) Most relevant features*: The latter 78 of 116 features selected (see “acs_housing_features.txt). All are percentages. Includes occupancy ratios, number of rooms’ ratios, housing build years, heating sources, plumbing and kitchen presence, mortgage status, and house value Source: https://api.census.gov/data/2018/acs/acs5/profile.html Feature descriptions: https://api.census.gov/data/2018/acs/acs5/profile/variables.html Notes: This is the most granular source found with full national coverage.

American Housing Survey National Level (AHS) Frequency: 2 years Granularity: Divisions, 20 metropolitan areas Most relevant features*: All of the ACS-5, but more detailed. These include: status of unit, structure characteristics (e.g. units in structure, stories...etc.), unit characteristics including heating equipment, selected amenities and deficiencies including electrical problems, neighborhood characteristics including housing types, rent/ownership status. For a comparison see https://www.census.gov/content/dam/Census/programs-surveys/ahs/publications/CombiningAHS-ACS.pdf Source: https://www.census.gov/programs-surveys/ahs/data/2017/ahs-2017-public-use-file--puf-/ahs-2017-national-public-use-file--puf-.html (flat file) Feature descriptions: https://www.census.gov/cgi-bin/nbroker?_service=sas_serv1&_debug=0&_program=cedr.sasapp_main.sas&s_appName=ahsdict&s_searchvalue=&s_year=&s_topic=&s_variable=&s_available=&s_minicode=E_2017&variable_detail_dialog=&variable_detail=&variable_question_text=&s_output=mpdf&menu=variable_table&s_orderBy=topic_number%20asc,%20subtopic_number%20asc,variable_number%20asc Notes: The public data for this only covers the divisions, but given the detail of the data that is withheld (see Feature descriptions link for full list of withheld features), it might be worth requesting access.

USPS Vacant Address Data Frequency: Quarterly Granularity: Tract (aggregated from USPS zip codes) Most relevant features*: All seem to be relevant as the only features are counts of vacancy or “no-stat” (i.e. abandoned/under construction/not yet occupied) and for how long this has been their status. Source: https://www.huduser.gov/portal/usps/index.html (requires free login, however we currently have quarterly data through 2018 Q2 in “02_Data/Not for Consumption - Archive/Matt’s Data/”) Feature descriptions: https://www.huduser.gov/portal/datasets/usps/USPS_Data_Dictionary_07212008.pdf Notes: As pointed out by Matt, useful features could be derived from looking at trends rising or falling numbers of vacancies/no-stats in a tract. For more info see (https://www.huduser.gov/portal/sites/default/files/pdf/2018-USPS-FAQ.pdf)

Residential Energy Consumption Survey (RECS) Frequency: 6 years (latest is 2016) Granularity: Division and ACS housing attributes (see Notes) Most relevant features*: energy consumption expenditures including average wood and space heating use, housing characteristics including “Most-used stove fuel”, “Main heating equipment age” amongst others averaged by housing type, age, renter status, number or occupants, income, climate, home size, and climate region. Source: https://www.eia.gov/consumption/residential/data/2015/ Feature descriptions: https://www.eia.gov/consumption/residential/data/2015/ Notes: This might provide useful, though not timely or anywhere near tract-level data that could be joined to ACS data. For example, a one-to-many join could link average RECS features to all houses within a census division or climate area. This data set was included because of the relevance of its features rather than its granularity or frequency.

Other free non-aggregated sources:

Further reading:

Weinberg, Daniel H. "Data sources for US housing research, part 1: Public sector data sources." Cityscape 16, no. 3 (2014): 131-148. Link: https://www.huduser.gov/portal/periodicals/cityscpe/vol16num3/ch6.pdf

* “Relevant” is based on feature descriptions at this point, not tested relationships with dependent variables. ** Below this level, data is withheld.

ChiefFireDataNerd commented 4 years ago

While the underlying data is a bit dated (2000) this dataset about the mean height of buildings within every census block in the conterminous US.
https://www.sciencebase.gov/catalog/item/5775469ce4b07dd077c7088a