I ran into the "UNCERTAIN LABEL" error on a case where I do not understand why it is choking, and per the guidance, I am submitting an issue. (Never submitted an issue, so if I am totally doing this wrong, please let me know!)
Two addresses:
1) RT 2 BX 565 CRYSTAL SPRINGS MS 390590000 (I know the trailing zeros are not great, still just experimenting.)
2) RT 4 BX 9 WESSON MS 391910000
First address one goes through .parse and .tag just fine:
parse = "[('RT', 'USPSBoxGroupType'), ('2', 'USPSBoxGroupID'), ('BX', 'USPSBoxType'), ('565,', 'USPSBoxID'), ('CRYSTAL', 'PlaceName'), ('SPRINGS,', 'PlaceName'), ('MS,', 'StateName'), ('390590000', 'ZipCode')]"
tag = "(OrderedDict([('USPSBoxGroupType', 'RT'), ('USPSBoxGroupID', '2'), ('USPSBoxType', 'BX'), ('USPSBoxID', '565'), ('PlaceName', 'CRYSTAL SPRINGS'), ('StateName', 'MS'), ('ZipCode', '390590000')]), 'PO Box')"
Second address appears to parse just fine, but throws the UNCERTAIN LABEL error on .tag:
parse = "[('RT', 'USPSBoxGroupType'), ('4', 'USPSBoxGroupID'), ('BX', 'USPSBoxGroupType'), ('9,', 'USPSBoxID'), ('WESSON,', 'PlaceName'), ('MS,', 'StateName'), ('391910000', 'ZipCode')]"
tag = "ERROR: Unable to tag this string because more than one area of the string has the same label
I can supply several other instances of pairs like this in the same "RT . . . BX" format.
This really is not an issue for my use case, but I saw the request to report errors in labeling (and wasn't entirely sure why one was tagging and the other was not) so I thought I would submit it, just in case it helps you guys out.
I ran into the "UNCERTAIN LABEL" error on a case where I do not understand why it is choking, and per the guidance, I am submitting an issue. (Never submitted an issue, so if I am totally doing this wrong, please let me know!)
Two addresses: 1) RT 2 BX 565 CRYSTAL SPRINGS MS 390590000 (I know the trailing zeros are not great, still just experimenting.) 2) RT 4 BX 9 WESSON MS 391910000
First address one goes through .parse and .tag just fine: parse = "[('RT', 'USPSBoxGroupType'), ('2', 'USPSBoxGroupID'), ('BX', 'USPSBoxType'), ('565,', 'USPSBoxID'), ('CRYSTAL', 'PlaceName'), ('SPRINGS,', 'PlaceName'), ('MS,', 'StateName'), ('390590000', 'ZipCode')]" tag = "(OrderedDict([('USPSBoxGroupType', 'RT'), ('USPSBoxGroupID', '2'), ('USPSBoxType', 'BX'), ('USPSBoxID', '565'), ('PlaceName', 'CRYSTAL SPRINGS'), ('StateName', 'MS'), ('ZipCode', '390590000')]), 'PO Box')"
Second address appears to parse just fine, but throws the UNCERTAIN LABEL error on .tag: parse = "[('RT', 'USPSBoxGroupType'), ('4', 'USPSBoxGroupID'), ('BX', 'USPSBoxGroupType'), ('9,', 'USPSBoxID'), ('WESSON,', 'PlaceName'), ('MS,', 'StateName'), ('391910000', 'ZipCode')]" tag = "ERROR: Unable to tag this string because more than one area of the string has the same label
ORIGINAL STRING: RT 4 BX 9, WESSON, MS, 391910000 PARSED TOKENS: [('RT', 'USPSBoxGroupType'), ('4', 'USPSBoxGroupID'), ('BX', 'USPSBoxGroupType'), ('9,', 'USPSBoxID'), ('WESSON,', 'PlaceName'), ('MS,', 'StateName'), ('391910000', 'ZipCode')] UNCERTAIN LABEL: USPSBoxGroupType
When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly
To report an error in labeling a valid name, open an issue at https://github.com/datamade/usaddress/issues/new - it'll help us continue to improve probablepeople!"
I can supply several other instances of pairs like this in the same "RT . . . BX" format.
This really is not an issue for my use case, but I saw the request to report errors in labeling (and wasn't entirely sure why one was tagging and the other was not) so I thought I would submit it, just in case it helps you guys out.
Let me know if you need more info/data.