focusconsulting / housing-insights

Bringing open data to affordable housing decision makers in Washington DC. A D3/Javascript based website to visualize data related to affordable housing in Washington DC. Data processing with Python.
http://housinginsights.org
MIT License
59 stars 110 forks source link

Scrape the DHCD DFD Pipeline public database and add it to our ingestion pipeline #136

Open NealHumphrey opened 7 years ago

NealHumphrey commented 7 years ago

DHCD provides a public dashboard of the projects they are funding along with their current phase of development. This is a quickbase application - browsable version here. From our interview with Chris Dickerson-Prokopp it sounds like this data is probably pulling from the most up to date data (we will want to verify this eventually).

Quickbase provides API methods to access the data. It returns XML. Unfortunately the database format does not appear to be documented by DHCD (based on a quick look - also something to verify). We can probably get information from DHCD by reaching out; or, we can guess the appropriate API calls by using the online browsable version.

This API call, for example, returns one building record with its loan amount and status: https://octo.quickbase.com/db/bit4krbdh?a=API_DoQuery&query={3.EX.1}

The generic quickbase API_DoQuery is documented here.

Phase 1:

Phase 2:

Improvements:

NealHumphrey commented 7 years ago

@ehenry2172 so I have talked to both Marie Whittaker at DMPED and Chris D.P. at DHCD. I think the API is OK to use as-is. But, I have one outstanding question to resolve with Urban Institute, who maintains the Preservation Catalog that this data set will interface with, to understand which parts of this might overlap with data we already have (though certainly not all of it is available). I have created a new ticket #194 which we should do first (once this issue is resolved), and then come back to this ticket to supplement with data not available in the first dataset.

For information on the opendata.dc.gov dataset, see my comments on issue #137. This has data starting at 2015, and pulls from the DHCD database (among others). However, it only includes building address and total number of units, so we will want more data (particularly loan data) from this data set.

DHCD's quickbase dataset has, according to Chris DP, about "80% of buildings they worked on 2013-2015, 50% of those from 2011-2013, and sporadic before that"

Property table: land units related to funding Project: each individual funding application; over time, a property could have multiple, and they can be accepted/rejected. loan: project to loan can be one to many; one project / application can result in multiple loans (from different sources) Units: splits all units in the building (regardless of funding) down by API requirements and/or bedroom units. Some properties have 'null' entries for bedroom - in this case, there would be ~3 records, one for each AMI level. If bedroom is not blank, there would be ~9-10 records, one for each combo of AMI and bedroom count. 8609 and 8610 - correspond to LIHTC, but newish so incomplete.

All the tables with a 'get all' link query, and whether or not the table is publicly accessible: dhcd-api-notes.xlsx

NealHumphrey commented 7 years ago

@ehenry2172 - I talked to Urban as well. My question was about the subsidies in the current Preservation Catalog marked as subsidy source "DC/AFFPIPELINE". Turns out they are not currently pulling in new buildings from this source, so we are good to go on adding this datasource. As noted above, we should do #194 and then come back to this ticket so that we can supplement the data from #194 with stuff like loan amounts from the quickbase database.