Open NealHumphrey opened 7 years ago
@hansak11 you were working on this with Jeanne, right? Where does this stand? Do you guys have a recommendation yet for which dataset from zillow we should actually use?
@NealHumphrey I did look into the zillow data earlier but haven't had a chance to look at it again since we were trying to work with Docker. Do you need this information soon? I can try to work on it this weekend and see if I find a good dataset to use.
No specific timeline, just want to get this integrated in the next few weeks. I was cleaning up all our tickets and this didn't have you as an assignee so I wanted to make sure you were still on it and it hadn't dropped off the radar. Thanks!
Apologies, Neal! I ended up staying late at work and the red line was a mess today and hence could not join the work session. I still haven't had a chance to figure out which Zillow dataset will be useful. I should have some time this week to look into it but if someone else wants to take over this task, it might work better (if this is an emergency) I do not have the admin restrictions on my laptop anymore so I should be able to work with Docker.
I will take over this issue.
I'm really interested in this data @terrysky18, please let me know once this is pulled together. Also, if there's anything I can do to help, let me know.
@terrysky18 - We briefly talked about this, but the best first step on this is the bullet point on the 'geographical summary':
After you prepare that, maybe it makes sense to set down with @domaley and discuss any other considerations on which rent data to use / exclude or which caveats to display alongside the data?
@NealHumphrey, @domaley - acknowledged.
The meta.json file contains 'zillow_zrisqft_neighbor'. The Zillow rent index per square feet for all home file contains the most neighbourhoods in DC; other files only provide either a few neighbourhoods or none for Washington.
The _Neighborhood_MedianRentalPrice1Bedroom.csv file contains data for 10 DC neighbourhoods: Columbia Heights, Capitol Hill, Adams Morgan, Logan Circle, Dupont Circle, Foggy Bottom, Mount Vernon Square, Forest Hills, Woodley Park, Navy Yard. Other median rental price files contain no data for DC at all.
The files for median rent price by postcodes contain similar amount of data. Zip for 1-bedroom file contains data for 10 postcodes. Zip for 2-bedroom file contains 6 postcodes. Files for more bedrooms do not have data for DC.
From data coverage stand point, zri-sqft files by neighbourhoods and postcodes provide the most amount of data; the file names are _Neighborhood_ZriPerSqftAllHomes.csv and _Zip_ZriPerSqftAllHomes.csv respectively. I will look into how a rent price can be derived from a rent index. Comments and suggestions please.
Other files that may be useful are _Neighborhood_Zri_AllHomes.csv_ and _Zip_ZriAllHomes.csv.
_Neighborhood_Zri_AllHomes.csv_ contains the same number of neighbourhoods as in zri-sqft, which is 92. It gives a monthly rent price for the whole neighbourhood.
_Zip_ZriAllHomes.csv gives the monthly rent price by postcode. However, the zri by postcode only contains 21 entries for DC.
We could use both _Neighborhood_Zri_AllHomes.csv and Neighborhood_ZriPerSqft_AllHomes.csv_. Rent index for all homes can provide a quick overview of a neighbourhood. Rent index per square feet can be used to calculate more precise rent estimate when room size in a specific neighbourhood is available.
Ideally we would like to be able to map the Zillow data onto the geographic units that are used by DC area policymakers - DC Wards and DC Neighborhood Clusters. This DC gov website has the info for the boundaries of DC wards and DC neighborhood clusters. ( https://www.neighborhoodinfodc.org/nclusters/nclusters.html). If there is a way to match up the data, that'd be great.
On Tue, May 30, 2017 at 7:13 PM, Terry Song notifications@github.com wrote:
Other files that may be useful are Neighborhood_Zri_AllHomes.csv and Zip_Zri_AllHomes.csv.
Neighborhood_Zri_AllHomes.csv contains the same number of neighbourhoods as in zri-sqft, which is 92. It gives a monthly rent price for the whole neighbourhood.
Zip_Zri_AllHomes.csv gives the monthly rent price by postcode. However, the zri by postcode only contains 21 entries for DC.
We could use both Neighborhood_Zri_AllHomes.csv and Neighborhood_ZriPerSqft_AllHomes.csv. Rent index for all homes can provide a quick overview of a neighbourhood. Rent index per square feet can be used to calculate more precise rent estimate when room size in a specific neighbourhood is available.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/codefordc/housing-insights/issues/53#issuecomment-305035086, or mute the thread https://github.com/notifications/unsubscribe-auth/ALmQAW_IePoadEX2W_44bU9O1cckeDsRks5r_KKegaJpZM4Lb_eo .
@domaley That is something I've looked into some. This is the ticket related to calculating this, which is on hold: https://github.com/codefordc/housing-insights/issues/148
We have the boundaries of the Zillow neighborhoods, which are different from any other boundaries we have. To roll them up we need some way of splitting a zillow neighborhood between two different neighborhood clusters, for example, when it crosses a border. I recommended using the Master Address Repository which could calculate the number of residential units within the overlapping zones and use that to make weighting factors.
We will probably delay this, because we have weighting factors to use for all our other data sources already...
We need to figure out which portion of our rent data to use to create a meaningful estimate of 'market rent'
We have two primary sources of rent data: American Community Survey Table B25058 (median), B25059 (upper quartile) and B25057 (lower quartile).
Zillow Rent Estimate We use the research data set provided by Zillow. They provide lots of different roll-ups of this data, but unfortunately not all of them are available at the level of detail we want. We can pick between neighborhood and zip code level, as well as which types of buildings/units to include (e.g. single family homes, condos, multifamily buildings, bedroom count). They use machine learning algorithms, a 3 month rolling average. They calculate Rent Zestimates, and then aggregate Rent Zestimates to for the Zillow Rent Index for a specific region.
Note, we do want to use the time series data, and get as close as we can to the 5 year time period of ACS, but we are limited by how far back the data on Zillow goes.
Successful completion of this ticket: