Open stevevance opened 7 years ago
This is what I'm using now:
stname_pattern = (\S*[a-z]\S*\s){1,6}
sttype_pattern = (ave|blvd|cres|ct|dr|hwy|ln|pkwy|pl|plz|rd|row|sq|st|ter|way|broadway|market|o)
That "o" in there is to catch the street name "Avenue O". It's imperfect but it works for now. It won't catch the other Avenue [letter] street names, though.
want to make a PR?
Any regex should catch the following numbered addresses:
1105-1111 E 95th St to 95th St
instead. This regex pattern captures addresses 2-5 only (although I believe it can be simplified):
((?<!-)\b\d{1,5}(-\d{1,5})?\s([.0-9a-z]*\s){1,5}(avenue [a-z]|ave|blvd|cres|ct|dr|hwy|ln|pkwy|pl|plz|rd|row|sq|st|ter|way|broadway|market))
The intersection pattern fails on "Zoning Reclassification Map No. 22-F at W 87th St and S State St and E 88th St and S Lafayette Ave"
((?<=\sat\s)(\S*[a-z]\S*\s){1,6}(avenue [a-z]|ave|blvd|cres|ct|dr|hwy|ln|pkwy|pl|plz|rd|row|sq|st|ter|way|broadway|market)([ ,-.]|\b)\s?and\s?(\S*[a-z]\S*\s){1,6}(avenue [a-z]|ave|blvd|cres|ct|dr|hwy|ln|pkwy|pl|plz|rd|row|sq|st|ter|way|broadway|market)([ ,-.]|\b))
Here's the full pattern:
/(((?<!-)\b\d{1,5}(-\d{1,5})?\s(\S*[a-z]\S*\s){1,6}(avenue [a-z]|ave|blvd|cres|ct|dr|hwy|ln|pkwy|pl|plz|rd|row|sq|st|ter|way|broadway|market)([ ,-.]|\b))|((?<=\sat\s)(\S*[a-z]\S*\s){1,6}(avenue [a-z]|ave|blvd|cres|ct|dr|hwy|ln|pkwy|pl|plz|rd|row|sq|st|ter|way|broadway|market)([ ,-.]|\b)\s?and\s?(\S*[a-z]\S*\s){1,6}(avenue [a-z]|ave|blvd|cres|ct|dr|hwy|ln|pkwy|pl|plz|rd|row|sq|st|ter|way|broadway|market)([ ,-.]|\b)))/i
This seems to work for all 7 tests given. Will fail in the following example (though perhaps irrelevant to Chicago): "42 Webster Dr. and 101 Forest Dr."
((?<!-)\b\d{1,5}(-\d{1,5})?\s([.0-9a-z]*\s){1,5}(avenue [a-z]|ave\.?|blvd\.?|cres\.?|ct\.?|dr\.?|hwy\.?|ln\.?|pkwy\.?|pl\.?|plz\.?|rd\.?|row\.?|sq\.?|st\.?|ter\.?|way\.?|broadway|market)(?:\s|$))
@herbiemarkwort That pattern fails on the 1st address "Sale of City-owned property at 1105-1111 E 95th St to 95th St Building LLC", which should match 1105-1111 E 95th St
but it matches 1105-1111 E 95th St to 95th St
.
But it catches all the rest, including the 6th and 7th addresses, which my pattern couldn't match.
And, "42 Webster Dr. and 101 Forest Dr." doesn't contain valid addresses in Chicago because they don't have a cardinal direction (NSEW).
((?<!-)\b\d{1,5}(?:-\d{1,5})?\s(?:(?:n|north|w|west|s|south|e|east)\s(?:[.0-9a-z]*\s)?(?:[.a-z]*\s){,4}(?:avenue [a-z]|ave\.?|blvd\.?|cres\.?|ct\.?|dr\.?|hwy\.?|ln\.?|pkwy\.?|pl\.?|plz\.?|rd\.?|row\.?|sq\.?|st\.?|ter\.?|way\.?|broadway|market))(?:\s|$))
intersection
Ordinance: https://chicago.councilmatic.org/legislation/o-2017-2090/ Location: E 71st St and S Stony Island Ave
Interpretation:
E 71st St and S St
(this gets cut off atSt
inStony
), and thus it's geocoded wrong and the map shows the wrong location.Regex pattern for addresses:
(\S*[a-z]\S*\s){1,4}?
address without suffix/type
Ordinance: https://chicago.councilmatic.org/legislation/o-2017-2210/ Location: 6145-6149 N Broadway
Interpretation: No address found because it doesn't have a suffix/type ("St", "Ave", etc.)
If you change the address pattern to
((?<!-)\b\d{1,5}(-\d{1,5})?\s(\S*[a-z]\S*\s){1,4}(ave|blvd|cres|ct|dr|hwy|ln|pkwy|pl|plz|rd|row|sq|st|ter|way|broadway))
it should capture both edge cases.address with unlisted suffix/type
Zoning Reclassification Map No. 1-G at 1107 W Fulton Market - App No. 18139T1
- in this case, "Market" is the same as "Avenue", but the city doesn't abbreviate it.Add "market" to the list of suffixes in the address pattern capture group.
Long addresses
Take this ordinance title:
Zoning Reclassification Map No. 16-E at 6311 S Calumet Ave, 6301-6335 S Calumet Ave, 343-365 E 63rd St and 6300-6334 S Dr. Martin Luther King Jr Dr
The regex will capture all but the last address (on King Drive) because it only allows for up to 4 words to be captured. It needs to be increased to 6 words to be able to capture as far as "Jr".