ccao-data / data-architecture

Codebase for CCAO data infrastructure construction and management
https://ccao-data.github.io/data-architecture/
6 stars 4 forks source link

How-to: Count total number of housing units in Cook County with public data #562

Open dfsnow opened 1 month ago

dfsnow commented 1 month ago

Researchers often want to know the total number of housing units in Cook County in a specific tax year, according to the Assessor's Office.

Right now, that is technically at least partially answerable but the data are in multiple data sources. We need to think about number of cards on a res PIN, livable condo spaces, and large multi-unit properties.

The goal of this project is to get a count of the number of housing units in Cook County in tax year 2023 using open data, provide advice on making this data more accessible to the public, and perhaps creating a contribution to our reporting database that institutionalizes these housing counts.

Data sources that are publicly available (happy to discuss more)

Note that these are on Open Data. The portal allows you to filter and count.

Other things to look at

Suggested outputs

ccao-jardine commented 1 month ago

Added details: Single- and Multi-Unit Characteristics: the num_apartments column is useful. If it's "None" or NULL, let's assume that this property has 1 living unit (because it's probably a condo or single-family home, without any apartments). For other properties with multiple apartments, let's multiply appropriately. Do filter tax year = 2023

Condos: each row is one living unit, unless it's parking or nonlivable space. Do filter tax year = 2023

For commercial data: @wrridgeway might have a helpful data dictionary, but for now, the sum of apt (instead of tot_units). For this one, note that you do not want to filter tax year = 2023

For the exploratory data, feel free to use excel.

yufeinancyliu commented 1 month ago

here is the result I got, in the sheet 'final_output', in 'output.xlsx'; and the plan and process of calculation is in this google doc. You could click the links and open them. If anything wrong, please tell me and I will revise them later. And Dan suggests to turn this into a short how-to document. Could you provide me some examples of them? Thanks! @ccao-jardine

ccao-jardine commented 1 month ago

Thanks -- this is a good start! Some overall feedback:

google doc:

This is a good start -- let's make it a little more accessible to someone, as if this was a blog post how-to guide. I've attached an example how-to. HowTo_Sales.pdf

output.xlsx

Great start! I just want some QC, I think; this is the first time we've tried to use the data this way.

general:

I'm curious to see whether these numbers match up to other sources of housing counts, like the US Census. Can you find a few external counts of housing counts, and a description of their methods, so we might compare our totals?

I've requested edit access so I can make more specific feedback.

yufeinancyliu commented 1 month ago

Here is the formatted draft for instruction file. Please tell me if you have any suggestions. :) @ccao-jardine @dfsnow instruction_housing_unit0807.pdf

yufeinancyliu commented 1 month ago

The updated (considering apt, sum of unit columns and tot_units) instructions. The recommendation. The updated output table. Please tell me if you have any suggestion:) @dfsnow @ccao-jardine

ccao-jardine commented 1 month ago

Thanks, the instructions are looking very clear -- nice work! I've requested edit access to the google doc to make small wording revisions.

One thing we should dig into are the instructions for section 3. Large multi-unit properties. All steps say we should first filter the modelgroup to "Multifamily." But one hypothesis is that there might be other kinds of housing units with different modelgroup names. If so, this filter might be excluding some housing units in different model groups.

To test this hypothesis, I looked at sum(apt), sum(tot_units), and sum(2brunits) grouped by modelgroup, without filtering modelgroup. (I chose 2brunits at random from the studiounits, 1brunits, etc.) By doing this, I found a couple of types of multi-family housing with tot_units > 0 that don't have modelgroup = "Multifamily." Specifically:

For completeness, I think we should show how to count these, but let users make a decision whether to include nursing homes and affordable housing. Let's do the following steps:

  1. Step 1: add two more sections to the instructions (Nursing Homes and Affordable Housing) to count these, where you apply the right modelgroup filter and then sum things appropriately. Please also add these as two more tabs to the google sheet.
  2. Step 2: one question is whether nursing homes and affordable housing qualify as "housing units" in the way these are typically measured! Please spend 1-2 hours two researching whether other agencies that count housing units (Census Bureau, American Comunity Survey, FRED aka Federal Reserve Bank of St. Louis, and Statista) each count nursing homes and affordable housing these as housing units. Please add a table to the instructions doc that looks like the below to summarize your findings. Note that the below is just a demo:
Data source Nursing Homes counted as housing units? Affordable Housing counted as housing units? Sources
Census Bureau no, doesn't count not sure http://...
FRED no, doesn't count yes, this counts http://
  1. Step 3: After completing step 2, we'll have an idea of whether these other agencies do or don't count Nursing Homes and Affordable Housing in their housing count totals. In your instruction doc for the "Total Count of Housing Units in Cook County with Public Data" section, let's give two examples! First, we can keep your existing example to exclude Nursing Homes and Affordable Housing. Then, please add another example to add in Nursing Homes and Affordable Housing.

Thoughts?

yufeinancyliu commented 3 weeks ago

I found housing units count data from Census Bureau, and the count in 2023 is 2,280,981, calculated by survey and sampling, which is higher than what we got.

yufeinancyliu commented 3 weeks ago

I completed the tasks above, and the links are the same ones as the links in previous comment.

After reading through Census Bureau's AHS's survey instruction, I found they considered the nursing home and no sign of excluding the affordable houses. FRED used the data from Census Bureau. American Community Survey has no sign of excluding the two categories. For Statista, I need to pay to see the data source.

Please tell me if you have any suggestion! Thanks.

ccao-jardine commented 3 weeks ago

Great, thanks! I made some minor suggested text modifications in the google doc, such as some minor word/organization fixes, and adding links to the google sheet. When you have time, for each suggestion, please accept the suggestions, or revise them if I've introduced any errors. And definitely be sure to add authorship information to the top of the document too so that you receive appropriate credit.

Then that's a wrap on this issue!

yufeinancyliu commented 3 weeks ago

Great, thanks! I made some minor suggested text modifications in the google doc, such as some minor word/organization fixes, and adding links to the google sheet. When you have time, for each suggestion, please accept the suggestions, or revise them if I've introduced any errors. And definitely be sure to add authorship information to the top of the document too so that you receive appropriate credit.

Then that's a wrap on this issue!

Thank you! I updated the files with your suggestions. Do I need to open the google sheet accessibility? I notice that you added the link in the instruction document.

ccao-jardine commented 3 weeks ago

Do I need to open the google sheet accessibility?

Oh, good catch 😅 Yes please!