datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.51k stars 303 forks source link

Invalid pre modifier parsing #308

Open ssbb opened 3 years ago

ssbb commented 3 years ago

I see an issues with some pre modifiers like OLD, LITTLE. Parser thinks it's a street name while it's not.

Example address:

When trying to tag these addresses exception is raised:

ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING:  53 LITTLE W 12 ST
PARSED TOKENS:    [('53', 'AddressNumber'), ('LITTLE', 'StreetName'), ('W', 'StreetNamePostDirectional'), ('12', 'StreetName'), ('ST', 'StreetNamePostType')]
UNCERTAIN LABEL:  StreetName

When this error is raised, it's likely that either (1) the string is not a valid person/corporation name or (2) some tokens were labeled incorrectly

To report an error in labeling a valid name, open an issue at https://github.com/datamade/usaddress/issues/new - it'll help us continue to improve probablepeople!

For more information, see the documentation at https://usaddress.readthedocs.io/