CityofPortland / pdxdata

Place for discussion about City of Portland data
15 stars 2 forks source link

Add address_id and x,y to permit data #5

Open caged opened 8 years ago

caged commented 8 years ago

@ksmpdx Is there a way to get the address_id, and potentially the x,y from the city's address data added to the building permits data? I'd like to see the geo location of the permits, but the data currently doesn't include this information. I've been trying to sync these up for a better part of a day by comparing the full addresses to each other, however I've run into multiple issues.

The first challenge was a set of small issues that I bumped into before actually trying to compare the addresses.

The next challenge in trying to sync these up manually is that there are 615,752 permits in the permit dataset and 416,424 address points. For each permit, we need to potentially scan every address to find one that matches, which is 615752 * 416424. That's a potential 256,413,910,848 iterations in a worst-case scenario.

There are ways to reduce the number of comparisons that have to be made, such as caching the address for the permit permit_case_number, since there are multiple entries for a single permit case number as the particular property has to go through multiple inspections.

Now the problem moves on to comparing the actual addresses with each other. The major problem here is that it's a very finicky process and using a tool like csvjoin requires an exact match. I'm currently in this part of the process and I've hit a couple of snags pretty quickly.

There are double spaces in the full address strings. This can be difficult to spot and doesn't result in an exact match. It's easy enough to collapse spaces for the entire file. After doing this I ended up getting matches for roughly 40k permits, or about 9% of addresses and 6% of permits.

After getting the first batch of matches, I noticed that the reason more addresses aren't matching is because most of the permit data addresses end with " PORTLAND," however the address data file doesn't follow this pattern.

And, that's where I left things. I kinda felt like I went pretty far down the rabbit hole already, and maybe I should stop to see if I could get someone at the city to add this information if it's possible.

Thanks!

ksmpdx commented 8 years ago

Yes, I feel your pain. The info coming out of the City's current permitting DB leaves much to be desired. Good news is we're moving to a new system. Bad news is that it will be 2018 until it's fully up-and-running.

In the interim, a couple of options. If you only care about permits that create new residential units, we have all of those mapped going back to 1996 (current through the end of July). Happy to pass along. I can give you a point shapefile.

If you want all permits, we have an x/y for all those that reconcile to an existing taxlot via the taxlot's State ID. This is the information you see in PortlandMaps. Unfortunately, any that don't reconcile -- because the State ID no longer exists, it was typed incorrectly (I know), etc. -- are lost. Could be as many as 20% of all permits. We are currently building out a state ID history table will be used to better match permits to taxlots. This should have a much higher success rate than the current method. Hopefully we'll use this method to map all permits, develop a workflow for keeping it current, and then put the data out there for people to use sometime in the next 3-6 months. Going to involve some manual cleanup to get to the initial dataset. Working on demolitions first, as there's a lot of interest and not much info available.

Bottom line, address is a dead-end with the current system. The taxlot State ID is the best way to tie permits to a known location, in this case the X/Y centroid of the lot.

Hope that helps!

caged commented 8 years ago

@ksmpdx Thanks for the quick and detailed response. You're pretty much my favorite person right now! :)

In the interim, a couple of options. If you only care about permits that create new residential units, we have all of those mapped going back to 1996 (current through the end of July). Happy to pass along. I can give you a point shapefile.

This would be perfect given my main interest is residential construction.

An update on my original post...after stripping " PORTLAND" from the permit addresses, I was able to match a little over 539k permits to official addresses. Some permits appear to be city-wide or large areas, so they don't contain a full address. I suspect that's where a bulk of the missing permits fall. Happy to provide the results if you think it'll be useful in any way.

Thanks again. If I haven't mentioned it before, the ability to have this type of dialog with you and others on the other side of the equation is greatly appreciated.

ksmpdx commented 8 years ago

@Caged Sorry for the slow follow-up after the initial reply... just had a chance to remote in and track down the data. Posted here:

https://dl.dropboxusercontent.com/u/2836837/residential_permits_pdx_150907.zip

It's been updated through 8/31/15. Metadata does a reasonable job of explaining what's what, but let me know if you have questions. Farther back you go, less accurate the data is (especially true pre-2000).

Kudos on the address match! I'm impressed. We may want to compare work when we finish the taxlot reconcile. Should not be this hard. :-)

Agree, the dialog is great and beneficial to all. Thanks, and keep it coming.

caged commented 8 years ago

@ksmpdx Thanks for the data. It looks to be in pretty good shape sans missing issuedate for 2007 data. For whatever reason, that appears to be the only year missing (and one in 1998).

construction=# select year, count(*) from residential_permits where issuedate is null group by 1;
 year | count 
------+-------
 1998 |     1
 2007 |   188
ksmpdx commented 8 years ago

Weird. Maybe something went wrong when I exported. Here's the summary I'm pulling from the data, by year/# of permits/# of units built:

YEAR,Count_Permits,Count_Units 1995,1341,2069 1996,1414,2739 1997,1563,3903 1998,1575,4202 1999,1142,2093 2000,1311,3459 2001,1337,2321 2002,1479,3365 2003,1591,5084 2004,1476,3194 2005,1633,3994 2006,1509,3503 2007,1514,4395 2008,853,3312 2009,509,1040 2010,534,1174 2011,598,1499 2012,839,2682 2013,1006,3951 2014,1093,5446 2015,853,3738

Let me know if you continue to see a discrepancy.... I can re-post. Also, forgot to mention "Is_ADU". We manually flag all Accessory Dwelling Unit (ADU) permits using some standardized search criteria, since our permitting DB does not consistently track. Could be useful if ADU development is something you are interested in.

Check out the average number of new units per permit. Interesting -- if not surprising -- trend.

caged commented 8 years ago

@ksmpdx I get the same results as you if I group on the year columns, however 2007 data is missing the issuedate field.

year permits units
1995 1341 2069
1996 1414 2739
1997 1563 3903
1998 1575 4202
1999 1136 2093
2000 1310 3459
2001 1336 2321
2002 1479 3365
2003 1591 5084
2004 1474 3194
2005 1633 3994
2006 1507 3503
2007 1514 4395
2008 853 3312
2009 509 1040
2010 534 1174
2011 597 1499
2012 838 2682
2013 1004 3951
2014 1075 5446
2015 853 3738

Check out the average number of new units per permit. Interesting -- if not surprising -- trend.

It's interesting to look at the 90th percentile for units.

year percentile_90
2015 78
2014 112
2013 76
2012 64
2011 50
2010 47
2009 89
2008 70
2007 31
2006 21
2005 10
2004 12
2003 16
2002 15
2001 12
2000 39
1999 6
1998 17
1997 22
1996 12
1995 12

Units > 1

year median_units
2015 3.00
2014 6.00
2013 7.00
2012 10.00
2011 4.00
2010 2.00
2009 2.00
2008 2.00
2007 2.00
2006 2.00
2005 3.00
2004 4.00
2003 3.00
2002 4.00
2001 4.00
2000 4.00
1999 2.00
1998 4.00
1997 3.00
1996 4.00
1995 2.00

Lot's of interesting stuff in here.

ksmpdx commented 8 years ago

@caged Great to meet you last week. Let me know if you have any more questions about the permit data or you need anything else.

I'll also hijack this issue to put out this shameless plug: http://agency.governmentjobs.com/portlandor/default.cfm?action=viewJob&jobID=1234230

If you, @maxogden or @ngoldman know anyone who might be interested, please send it their way. Thanks!

ungoldman commented 8 years ago

@ksmpdx I'll pass the info around. The posting is for a job working with Rick Nixon's team, right? Is that BTS?

ksmpdx commented 8 years ago

Correct! This is Rick's team, in BTS. Thanks for passing around.

ungoldman commented 8 years ago

:+1: https://twitter.com/CodeforPortland/status/646737879918481408