focusconsulting / housing-insights

Bringing open data to affordable housing decision makers in Washington DC. A D3/Javascript based website to visualize data related to affordable housing in Washington DC. Data processing with Python.
http://housinginsights.org
MIT License
58 stars 110 forks source link

Which rent data to use? #53

Open NealHumphrey opened 7 years ago

NealHumphrey commented 7 years ago

We need to figure out which portion of our rent data to use to create a meaningful estimate of 'market rent'

We have two primary sources of rent data: American Community Survey Table B25058 (median), B25059 (upper quartile) and B25057 (lower quartile).

Zillow Rent Estimate We use the research data set provided by Zillow. They provide lots of different roll-ups of this data, but unfortunately not all of them are available at the level of detail we want. We can pick between neighborhood and zip code level, as well as which types of buildings/units to include (e.g. single family homes, condos, multifamily buildings, bedroom count). They use machine learning algorithms, a 3 month rolling average. They calculate Rent Zestimates, and then aggregate Rent Zestimates to for the Zillow Rent Index for a specific region.

Note, we do want to use the time series data, and get as close as we can to the 5 year time period of ACS, but we are limited by how far back the data on Zillow goes.

Successful completion of this ticket:

NealHumphrey commented 7 years ago

@hansak11 you were working on this with Jeanne, right? Where does this stand? Do you guys have a recommendation yet for which dataset from zillow we should actually use?

hansak11 commented 7 years ago

@NealHumphrey I did look into the zillow data earlier but haven't had a chance to look at it again since we were trying to work with Docker. Do you need this information soon? I can try to work on it this weekend and see if I find a good dataset to use.

NealHumphrey commented 7 years ago

No specific timeline, just want to get this integrated in the next few weeks. I was cleaning up all our tickets and this didn't have you as an assignee so I wanted to make sure you were still on it and it hadn't dropped off the radar. Thanks!

hansak11 commented 7 years ago

Apologies, Neal! I ended up staying late at work and the red line was a mess today and hence could not join the work session. I still haven't had a chance to figure out which Zillow dataset will be useful. I should have some time this week to look into it but if someone else wants to take over this task, it might work better (if this is an emergency) I do not have the admin restrictions on my laptop anymore so I should be able to work with Docker.

terrysky18 commented 7 years ago

I will take over this issue.

domaley commented 7 years ago

I'm really interested in this data @terrysky18, please let me know once this is pulled together. Also, if there's anything I can do to help, let me know.

NealHumphrey commented 7 years ago

@terrysky18 - We briefly talked about this, but the best first step on this is the bullet point on the 'geographical summary':

After you prepare that, maybe it makes sense to set down with @domaley and discuss any other considerations on which rent data to use / exclude or which caveats to display alongside the data?

terrysky18 commented 7 years ago

@NealHumphrey, @domaley - acknowledged.

terrysky18 commented 7 years ago

The meta.json file contains 'zillow_zrisqft_neighbor'. The Zillow rent index per square feet for all home file contains the most neighbourhoods in DC; other files only provide either a few neighbourhoods or none for Washington.

The _Neighborhood_MedianRentalPrice1Bedroom.csv file contains data for 10 DC neighbourhoods: Columbia Heights, Capitol Hill, Adams Morgan, Logan Circle, Dupont Circle, Foggy Bottom, Mount Vernon Square, Forest Hills, Woodley Park, Navy Yard. Other median rental price files contain no data for DC at all.

The files for median rent price by postcodes contain similar amount of data. Zip for 1-bedroom file contains data for 10 postcodes. Zip for 2-bedroom file contains 6 postcodes. Files for more bedrooms do not have data for DC.

From data coverage stand point, zri-sqft files by neighbourhoods and postcodes provide the most amount of data; the file names are _Neighborhood_ZriPerSqftAllHomes.csv and _Zip_ZriPerSqftAllHomes.csv respectively. I will look into how a rent price can be derived from a rent index. Comments and suggestions please.

terrysky18 commented 7 years ago

Other files that may be useful are _Neighborhood_Zri_AllHomes.csv_ and _Zip_ZriAllHomes.csv.

_Neighborhood_Zri_AllHomes.csv_ contains the same number of neighbourhoods as in zri-sqft, which is 92. It gives a monthly rent price for the whole neighbourhood.

_Zip_ZriAllHomes.csv gives the monthly rent price by postcode. However, the zri by postcode only contains 21 entries for DC.

We could use both _Neighborhood_Zri_AllHomes.csv and Neighborhood_ZriPerSqft_AllHomes.csv_. Rent index for all homes can provide a quick overview of a neighbourhood. Rent index per square feet can be used to calculate more precise rent estimate when room size in a specific neighbourhood is available.

domaley commented 7 years ago

Ideally we would like to be able to map the Zillow data onto the geographic units that are used by DC area policymakers - DC Wards and DC Neighborhood Clusters. This DC gov website has the info for the boundaries of DC wards and DC neighborhood clusters. ( https://www.neighborhoodinfodc.org/nclusters/nclusters.html). If there is a way to match up the data, that'd be great.

On Tue, May 30, 2017 at 7:13 PM, Terry Song notifications@github.com wrote:

Other files that may be useful are Neighborhood_Zri_AllHomes.csv and Zip_Zri_AllHomes.csv.

Neighborhood_Zri_AllHomes.csv contains the same number of neighbourhoods as in zri-sqft, which is 92. It gives a monthly rent price for the whole neighbourhood.

Zip_Zri_AllHomes.csv gives the monthly rent price by postcode. However, the zri by postcode only contains 21 entries for DC.

We could use both Neighborhood_Zri_AllHomes.csv and Neighborhood_ZriPerSqft_AllHomes.csv. Rent index for all homes can provide a quick overview of a neighbourhood. Rent index per square feet can be used to calculate more precise rent estimate when room size in a specific neighbourhood is available.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/codefordc/housing-insights/issues/53#issuecomment-305035086, or mute the thread https://github.com/notifications/unsubscribe-auth/ALmQAW_IePoadEX2W_44bU9O1cckeDsRks5r_KKegaJpZM4Lb_eo .

NealHumphrey commented 7 years ago

@domaley That is something I've looked into some. This is the ticket related to calculating this, which is on hold: https://github.com/codefordc/housing-insights/issues/148

We have the boundaries of the Zillow neighborhoods, which are different from any other boundaries we have. To roll them up we need some way of splitting a zillow neighborhood between two different neighborhood clusters, for example, when it crosses a border. I recommended using the Master Address Repository which could calculate the number of residential units within the overlapping zones and use that to make weighting factors.

We will probably delay this, because we have weighting factors to use for all our other data sources already...