datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

Address is along multiple highways #206

Open dfiorino opened 6 years ago

dfiorino commented 6 years ago

My example: "11098 US Hwy 15 501 N Chapel Hill NC" is located on US Highways 15 and 501. The algorithm tags 11098 and 501 as the "AddressNumber."

input: usaddress.tag("11098 US Hwy 15 501 N Chapel Hill NC")

output: RepeatedLabelError: ERROR: Unable to tag this string because more than one area of the string has the same label

ORIGINAL STRING: 11098 US Hwy 15 501 N Chapel Hill NC PARSED TOKENS: [('11098', 'AddressNumber'), ('US', 'StreetNamePreType'), ('Hwy', 'StreetNamePreType'), ('15', 'StreetName'), ('501', 'AddressNumber'), ('N', 'StreetNamePreDirectional'), ('Chapel', 'StreetName'), ('Hill', 'StreetName'), ('NC', 'StreetNamePostType')] UNCERTAIN LABEL: AddressNumber

fgregg commented 6 years ago

Hi @dfiorino, how should this be tagged? As

('11098', 'AddressNumber'), ('US', 'StreetNamePreType'), ('Hwy', 'StreetNamePreType'), ('15', 'StreetName'), ('501', 'StreetName'), ('N', 'StreetNamePostDirectional'), ('Chapel', 'PlaceName'), ('Hill', 'PlaceName'), ('NC', 'State')]?

dfiorino commented 6 years ago

Hi @fgregg, yes. That's what I think anyhow. Addresses on highways seem to have many variations on formatting, so I'm not sure which is commonly accepted.