datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.51k stars 303 forks source link

Parsing city incorrectly from a dataframe with a city column #280

Open bleckley opened 4 years ago

bleckley commented 4 years ago

Hi folks, Thanks for this great tool. I'm using it to parse a dataframe which has Address_1, Address_2, City, State, and Zip columns.

After parsing, some of the records with "Ann Arbor" as the City are being parsed with "Arbor" as the PlaceName. This is happening when the Address_1 column does not have a traditional StreetNameSuffix, and "Ann" is being passed into the StreetNameSuffix column.

Examples: 123 Main St. --> StreetNameSuffix = St; PlaceName = Ann Arbor 456 Boulder Pond --> StreetNameSuffix = Ann; PlaceName = Arbor 789 Parkview --> StreetNameSuffix = Ann; PlaceName = Arbor

It is possible this would impact other multi-word cities like Las Vegas, Los Angeles, San Diego, etc.

Is there a way to set the parameters to avoid this? In my case, I can use City instead of PlaceName, but StreetNameSuffix will have errors.

Thanks! David