datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

Impossible to parse the address with tag_mapping #318

Open asmhack opened 2 years ago

asmhack commented 2 years ago

Hi there, I tried to use lib on this example: 10 Forest Ave Apt 10 Old Greenwich CT 06870 and it won't parse it. But with Lake Forest instead of Old Greenwich it works perfect.

the actual code is bellow:

usaddress.tag('10 Forest Ave Apt 10 Old Greenwich CT 06870', tag_mapping={
   'Recipient': 'recipient',
   'AddressNumber': 'address1',
   'AddressNumberPrefix': 'address1',
   'AddressNumberSuffix': 'address1',
   'StreetName': 'address1',
   'StreetNamePreDirectional': 'address1',
   'StreetNamePreModifier': 'address1',
   'StreetNamePreType': 'address1',
   'StreetNamePostDirectional': 'address1',
   'StreetNamePostModifier': 'address1',
   'StreetNamePostType': 'address1',
   'CornerOf': 'address1',
   'IntersectionSeparator': 'address1',
   'LandmarkName': 'address1',
   'USPSBoxGroupID': 'address1',
   'USPSBoxGroupType': 'address1',
   'USPSBoxID': 'address1',
   'USPSBoxType': 'address1',
   'BuildingName': 'address2',
   'OccupancyType': 'address2',
   'OccupancyIdentifier': 'address2',
   'SubaddressIdentifier': 'address2',
   'SubaddressType': 'address2',
   'PlaceName': 'city',
   'StateName': 'state',
   'ZipCode': 'zip_code',
})

and the response is next:

Traceback (most recent call last):
  File "/scratches/scratch_156.py", line 10, in <module>
    t = usaddress.tag('10 Forest Ave Apt 10 Old Greenwich CT 06870', tag_mapping={
  File "/usr/local/lib/python3.8/site-packages/usaddress/__init__.py", line 177, in tag
    raise RepeatedLabelError(address_string, parse(address_string),
usaddress.RepeatedLabelError: 
ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  10 Forest Ave Apt 10 Old Greenwich CT 06870
PARSED TOKENS:    [('10', 'AddressNumber'), ('Forest', 'StreetName'), ('Ave', 'StreetNamePostType'), ('Apt', 'OccupancyType'), ('10', 'OccupancyIdentifier'), ('Old', 'StreetNamePreModifier'), ('Greenwich', 'PlaceName'), ('CT', 'StateName'), ('06870', 'ZipCode')]
UNCERTAIN LABEL:  address1

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly