datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.53k stars 304 forks source link

Street name mixed up for directional #179

Open alastairmatheson opened 7 years ago

alastairmatheson commented 7 years ago

ORIGINAL STRING: 2410 N ST NE,APT B,AUBURN,WA,98002 PARSED TOKENS: [('2410', 'AddressNumber'), ('N', 'StreetNamePostDirectional'), ('ST', 'StreetNamePostType'), ('NE,', 'StreetNamePostDirectional'), ('APT', 'OccupancyType'), ('B,', 'OccupancyIdentifier'), ('AUBURN,', 'PlaceName'), ('WA,', 'StateName'), ('98002', 'ZipCode')] UNCERTAIN LABEL: StreetNamePostDirectional

The same behavior occurs with W ST but not E ST or S ST.

jeancochrane commented 7 years ago

Thanks for filing this @alastairmatheson! Do you have any more examples of failing addresses that fit this pattern (named for N or W) that you'd be willing to share with us? I'd love to bring them in for the next round of training data.

alastairmatheson commented 7 years ago

I haven't come across any more yet, but I have ~48K addresses to work through so I'm sure I'll find more.