datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.53k stars 303 forks source link

Parsing error for StreetName PostDirectional #211

Open yashodhan19 opened 6 years ago

yashodhan19 commented 6 years ago

Noticed an issue with addresses containing StreetNamePostDirectional. The StreetNamePostDirectional gets tagged as a PlaceName.

Sample cases : 4984 HWY 6  NORTH HOUSTON TX 77084 1509 IH 35  NORTH NEW BRAUNFELS TX 78130 1500 PADRE BLVD SOUTH PADRE ISLE TX 78597

Obtained Output - ('AddressNumber', u'4984'), ('StreetNamePreType', u'HWY'), ('StreetName', u'6'), ('PlaceName', u'NORTH HOUSTON'), ('StateName', u'TX'), ('ZipCode', u'77084')

Expected output - ('AddressNumber', u'4984'), ('StreetNamePreType', u'HWY'), ('StreetName', u'6'), ('StreetNamePostDirectional', u'NORTH'), ('PlaceName', u'HOUSTON'), ('StateName', u'TX'), ('ZipCode', u'77084')

for the third example , the expected output for PlaceName is 'SOUTH PADRE ISLE' Obtained output is (u'1500', 'AddressNumber'), (u'PADRE', 'StreetName'), (u'BLVD', 'StreetNamePostType'), (u'SOUTH', 'StreetNamePostDirectional'), (u'PADRE', 'PlaceName'), (u'ISLE', 'PlaceName'), (u'TX', 'StateName'), (u'78597', 'ZipCode')

jeancochrane commented 6 years ago

Another great edge case. Again, we can either include this as training data or accept a PR with new training.