datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.5k stars 302 forks source link

Error parsing La in city name (i.e. La Quinta) as Louisiana using .tag #358

Open michaeljclausen opened 7 months ago

michaeljclausen commented 7 months ago

I've had some addresses work fine such as...

"49000 Calle Flora, La Quinta, CA 92253 United States" (OrderedDict([('AddressNumber', '49000'), ('StreetName', 'Calle Flora'), ('PlaceName', 'La Quinta'), ('StateName', 'CA'), ('ZipCode', '92253'), ('CountryName', 'United States')]), 'Street Address')

whereas "8100 Peary Place, La Quinta, California 92253 United States" results in...

ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING: 8100 Peary Place, La Quinta, California 92253 United States PARSED TOKENS: [('8100', 'AddressNumber'), ('Peary', 'StreetName'), ('Place,', 'StreetNamePostType'), ('La', 'StateName'), ('Quinta,', 'PlaceName'), ('California', 'StateName'), ('92253', 'ZipCode'), ('United', 'CountryName'), ('States', 'CountryName')] UNCERTAIN LABEL: StateName

After testing, it seems having a street address end in 'Place' trips up the parser.