datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.53k stars 303 forks source link

Parsing things that are not addresses #262

Open Prismacolor opened 5 years ago

Prismacolor commented 5 years ago

I had a test PDF file that contained names, addresses, phony Social Security numbers, and phone numbers, along with miscellaneous text. On some entries, the parser incorrectly identified the following format ###-##-#### as a street number and then proceeded to parse the following fields as an address even though they were not. What's strange is that it was able to identify some of the SSNs as not addresses, but on others it did not.