datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.53k stars 303 forks source link

Intersection address types with non-numeric building names parsing wrong #230

Open macie-korte opened 6 years ago

macie-korte commented 6 years ago

"post road and old country lane, building M"

is parsed as

Address part Tag
post StreetName
road StreetNamePostType
and IntersectionSeparator
old country SecondStreetName
lane SecondStreetNamePostType
building PlaceName
M StateName

I would expect

building | SubaddressType M | SubaddressIdentifier

instead.

however, "post road and old country lane, building M, apt 4" is parsed correctly as:

Address part Tag
post StreetName
road StreetNamePostType
and IntersectionSeparator
old country SecondStreetName
lane SecondStreetNamePostType
building SubaddressType
M SubaddressIdentifier
apt OccupancyType
4 OccupancyIdentifier

post road and old country lane, building 3 also is parsed correctly.


Problems are also seen if I explicitly add a city and state name

"post road and old country lane, building M, tampa, FL" parses as:

Address part Tag
post StreetName
road StreetNamePostType
and IntersectionSeparator
old country SecondStreetName
lane SecondStreetNamePostType
building M, tampa PlaceName
FL StateName