datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.51k stars 303 forks source link

"NE MLK Jr Blvd" is parsed incorrectly #317

Open ezheidtmann opened 2 years ago

ezheidtmann commented 2 years ago

The short_name of this OSM way is "NE MLK Jr Blvd" -- https://www.openstreetmap.org/way/418650741

It's short for "Northeast Martin Luther King Junior Boulevard".

image

And usaddress does a great job on the latter, but appears to consider "MLK" a "PreType":

13:37 $ ipython
Python 3.6.10 (default, Sep 15 2020, 13:20:10) 
Type 'copyright', 'credits' or 'license' for more information
IPython 7.16.1 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import usaddress

In [2]: usaddress.tag("Northeast Martin Luther King Junior Boulevard")
Out[2]: 
(OrderedDict([('StreetNamePreDirectional', 'Northeast'),
              ('StreetName', 'Martin Luther King Junior'),
              ('StreetNamePostType', 'Boulevard')]),
 'Ambiguous')

In [3]: usaddress.tag("NE MLK Jr Blvd")
Out[3]: 
(OrderedDict([('StreetNamePreDirectional', 'NE'),
              ('StreetNamePreType', 'MLK'),
              ('StreetName', 'Jr'),
              ('StreetNamePostType', 'Blvd')]),
 'Ambiguous')

Dozens of US cities have a street named "MLK Jr" and I imagine this abbreviation is common. Let me know what I can do to help improve results here.