datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

StreetNamePost type in City Name #212

Open yashodhan19 opened 6 years ago

yashodhan19 commented 6 years ago

The PlaceName gets tagged as StreetNamePostType in cases where the PlaceName is like a StreetNamePostType.

Sample strings that were parsed. 5113 OLD GRANBURY ROAD FORT WORTH TX 76133 115 LONGHORN ROAD FORT WORTH TX 76179 12317 DAVIS BULEVARD FORT MYERS, FL 33905 418 S ALISTER STREET PORT ARANSAS TX 78373 1600 N STATE STREET FORT DAVIS TX 79734 7001 S FREEWAY FORT WORTH TX 76134

Obtained output (u'5113', 'AddressNumber'), (u'OLD', 'StreetName'), (u'GRANBURY', 'StreetName'), (u'ROAD', 'StreetName'), (u'FORT', 'StreetNamePostType'), (u'WORTH', 'PlaceName'), (u'TX', 'StateName'), (u'76133', 'ZipCode')

Expected Output (u'5113', 'AddressNumber'), (u'OLD', 'StreetName'), (u'GRANBURY', 'StreetName'), (u'ROAD', 'StreetNamePostType'), (u'FORT', 'PlaceName'), (u'WORTH', 'PlaceName'), (u'TX', 'StateName'), (u'76133', 'ZipCode')

The parser works well in cases where the StreetNamePostType is abbreviated and in some cases if it is spelt correctly 'Boulevard' or 'BLVD' instead of 'Bulevard'

if the address string was '5113 OLD GRANBURY RD FORT WORTH TX 76133', the output is as expected.

(u'5113', 'AddressNumber'), (u'OLD', 'StreetName'), (u'GRANBURY', 'StreetName'), (u'RD', 'StreetNamePostType'), (u'FORT', 'PlaceName'), (u'WORTH', 'PlaceName'), (u'TX', 'StateName'), (u'76133', 'ZipCode')

jeancochrane commented 6 years ago

That's an interesting edge case @yashodhan19! We can drop these in as training data for the next release. I'd also accept a PR if you want to help improve the model yourself.