datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.53k stars 303 forks source link

RepeatedLabelError, "LA" in two-part city name confused with "Louisiana" abbrev. #205

Open eidietrich opened 6 years ago

eidietrich commented 6 years ago

Apologies if this is redundant with an earlier report (just did a quick skim and didn't see anything precisely like this), but I've run into a RepeatedLabelError where the "LA" in a two-part California city name "LA HABRA" is being interpreted as a state abbreviation for Louisiana.

Here's the full error message, coming out of a usaddress.tag() call:

ORIGINAL STRING:  1311 DEBWOOD PLACE LA HABRA CA 90631
PARSED TOKENS:    [(u'1311', 'AddressNumber'), (u'DEBWOOD', 'StreetName'), (u'PLACE', 'PlaceName'), (u'LA', 'StateName'), (u'HABRA', 'PlaceName'), (u'CA', 'StateName'), (u'90631', 'ZipCode')]
UNCERTAIN LABEL:  PlaceName

Also, wonderful library — thanks for your work with it!