focusconsulting / housing-insights

Bringing open data to affordable housing decision makers in Washington DC. A D3/Javascript based website to visualize data related to affordable housing in Washington DC. Data processing with Python.
http://housinginsights.org
MIT License
59 stars 110 forks source link

DHCD addresses not found in MAR #603

Open NealHumphrey opened 6 years ago

NealHumphrey commented 6 years ago

Some of the addresses in the DHCD quickbase database that are being ingested into Housing Insights are not found in the Master Address Repository; typically this is because of an invalid address.

Solution steps:

Addresses: (Note - these have been pre-precessed by our code to split up different addresses so will be slightly different in the DHCD database)

Addresses Mar_ids Notes
'1649 WEST STREET SE', '1641 WEST STREET SE', '1643 WEST STREET SE', '1645 WEST STREET SE', '1647 WEST STREET SE', '2312 GREEN STREET SE', '2304 GREEN STREET SE', '2306 GREEN STREET SE', '2308 GREEN STREET SE', '2310 GREEN STREET SE' None, None, None, None, None, '148158', '150350', '48988', '148157', '48744' Is this supposed to be W St SE?
'7th R', 'Rhode Island NW' None, None This was written as an intersection
'28th Place SE' None No street number
'2300 30th Place NE' None  
'HAYES STREET & ANACOSTIA AVENUE NORTH EAST' None Intersection - need street address
'8TH & "H" STREET NE' None Intersection - need street address
'10th Street & Rhode Island Avenue NE' None Intersection - need street address
'Central Place NE' None No street number
'2519, 2521, 2552, 2554, AND 2548 UNIVERSITY PLACE2639 CONNECTICUT AVE NW #113' None mashed up addresses
'Robinson Place SE' None No street number
'1715 1/2 7TH ST NW' None  
'1715 1/2 7TH ST NW' None  
ChrisDickersinProkopp commented 6 years ago

This list is a good sampling of different reasons addresses might not match the MAR. A few examples:

  1. '28th Place' and '10th & Rhode Island.' - each of these a development that occurred (will occur) on a vacant, previously undeveloped lot with know address. They just have an SSL at this point. The assignment of an address happens late in our process (or after financial closing?) and we haven't been updating the data at that point, since we usually enter/import an address once when a project record is created and never touch it again. We can be better about updating it though.
  2. Almost all of these locations are old projects that there is no institutional knowledge of (for the most part). I spent some time years ago trying To match old projects to addresses, but it entailed browsing or querying nearby properties to see which ones were owned by the District of Columbia or by the LLC that our loan financed. Not sure how to bring legacy projects up to date except through this manual labor approach. It will be hard to commit staff time to updating the old projects but we can definitely try to prevent these issues on new, active projects.
  3. 'Hayes Street &..." is an example (I think) of a large site plan (several blocks) that eventually became dozens of single family home addresses and possibly other land uses. What is the best way to identify a project like this, so we don't end up using the address of one person's home just because it is the first on the block?
  4. The MAR used to plot intersections. If that's still the case, Is it better to place a project at the intersection to give users a sense of general location or to omit it entirely from the map? I see pros and cons on both sides - you may have thought through this already.

Also: through our online application system we now collect information in a table about every address/SSL covered by the project. If I recall correctly, the first address listed is what gets pushed into the Pipeline App when a project is selected and moves from the Financing Application app (private) into the Pipeline App (Public). That seems like it should work fine for the purposes of this tool, but if you have ideas about how we could better take advantage of that data we are collecting we'd be happy to hear it.

NealHumphrey commented 6 years ago

@ChrisDickersinProkopp Thanks for the info - sounds like these are more complicated than I thought it would be for these cases.

The biggest way this is used currently is de-duplication of projects that are already in the PresCat for whatever reason. If we can use the table of every address/SSL that would definitely be an improvement. It would be good to get more info on where to access that data table.

For now let's mostly save this response for a more comprehensive review of the matching. A few quick notes though:

  1. If it's a new development and doesn't have an address, we can use the intersection address as is. Would be great to get it updated when it does come through, though, so we can take advantage of the extra data that comes with matching w/ the mar id

  2. It probably makes sense to add a 'won't fix' flag to anything that we think is too much effort to fix. Will think about where best to put this.

  3. We also have a one-to-many project to address table, so the cleanest way is to get the list of all the addresses that ended up being created. But might be worth discussing if this is feasible.

  4. We opted not to use the intersection-based MAR address lookup because it returned too many false positives. If there was a mistyped address or other error, it would say it was close enough and return a (wrong) MAR ID anyways. Using just the street address version was safer. If there are many cases that need intersections, we would need to add code in our side that could correctly identify an intersection-style address and only send those ones to the intersection-based MAR lookup.