datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

ERROR: Unable to tag this string because more than one area of the string has the same label #354

Open cvsanthosh opened 1 year ago

cvsanthosh commented 1 year ago

I have an address like this which is in a dataset which ofcourse is not proper. "333 Wilkerson Ave., Stes. B & C, Perris, CA; 4262 Riverfield Ct., Riverside, CA ".

I have python code as follows. tagged_address, address_type = usaddress.tag(row['ADDRESS'],tag_mapping={ 'Recipient': 'recipient', 'AddressNumber': 'address1', 'AddressNumberPrefix': 'address1', 'AddressNumberSuffix': 'address1', 'StreetName': 'address1', 'StreetNamePreDirectional': 'address1', 'StreetNamePreModifier': 'address1', 'StreetNamePreType': 'address1', 'StreetNamePostDirectional': 'address1', 'StreetNamePostModifier': 'address1', 'StreetNamePostType': 'address1', 'CornerOf': 'address1', 'IntersectionSeparator': 'address1', 'LandmarkName': 'address1', 'USPSBoxGroupID': 'address1', 'USPSBoxGroupType': 'address1', 'USPSBoxID': 'address1', 'USPSBoxType': 'address1', 'BuildingName': 'address2', 'OccupancyType': 'address2', 'OccupancyIdentifier': 'address2', 'SubaddressIdentifier': 'address2', 'SubaddressType': 'address2', 'PlaceName': 'city', 'StateName': 'state', 'ZipCode': 'zip_code', })

The code errors out saying

RepeatedLabelError: ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING: 333 Wilkerson Ave., Stes. B & C, Perris, CA; 4262 Riverfield Ct., Riverside, CA PARSED TOKENS: [('333', 'AddressNumber'), ('Wilkerson', 'StreetName'), ('Ave.,', 'StreetNamePostType'), ('Stes.', 'PlaceName'), ('B', 'PlaceName'), ('&', 'PlaceName'), ('C,', 'PlaceName'), ('Perris,', 'PlaceName'), ('CA;', 'StateName'), ('4262', 'AddressNumber'), ('Riverfield', 'StreetName'), ('Ct.,', 'StreetNamePostType'), ('Riverside,', 'PlaceName'), ('CA', 'StateName')] UNCERTAIN LABEL: address1

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly

To report an error in labeling a valid name, open an issue at https://github.com/datamade/usaddress/issues/new - it'll help us continue to improve probablepeople!

For more information, see the documentation at https://usaddress.readthedocs.io/