datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.51k stars 303 forks source link

City not Parsed Correctly #287

Open benngarcia opened 3 years ago

benngarcia commented 3 years ago

Input: "San Francisco, CA 00000" (Also tested with comma between CA and 00000)

Outputs the PlaceName as "Francisco"

benngarcia commented 3 years ago

Does the same thing for New York, NY

arvindram11 commented 3 years ago

I am seeing a similar issue with some other addresses. The addresses below are fictional (to avoid PII issues).

12345 North Point Laguna Hills CA 12345 <-- This one parses correctly:

12345 | AddressNumber North | StreetName Point | StreetNamePostType Laguna Hills | PlaceName CA | StateName 12345 | ZipCode

12345 South Point Laguna Hills CA 12345 <-- But this one parses incorrectly:

12345 | AddressNumber South | StreetNamePreDirectional Point Laguna | StreetName Hills | PlaceName CA | StateName 12345 | ZipCode

It looks like it knows that South is StreetNamePreDirectional but North is just StreetName (it is possible that there is no valid North Point in Laguna Hills CA). But I thought it was agnostic of if it is a valid address or not? Is that not so?

hainesdata commented 3 years ago

I'm having the same issue as well. New York, NY parsed the city as just 'YORK'.

I tried adding commas between address elements to see if that makes it easier to classify elements but that hasn't worked.

FirebornX commented 3 years ago

Run into this with Key West FL as well.