datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

Parse addresses with no address number and two words as street name #258

Open bbharathrao opened 5 years ago

bbharathrao commented 5 years ago

Training for the address with no address number and two words as a part of street name. Example 1:

The Way it is parsing currently

address1 = usaddress.tag("Mid Island Street HICKSVILLE NY 11801") pprint(address1) (OrderedDict([('AddressNumber', 'Mid'), ('StreetName', 'Island'), ('StreetNamePostType', 'Street'), ('PlaceName', 'Hicksville'), ('StateName', 'NY'), ('ZipCode', '11801')]), 'Street Address')

The Way it needs to be Parsed

address1 = usaddress.tag("Mid Island Street HICKSVILLE NY 11801") pprint(address1) (OrderedDict([('StreetName', 'Mid Island'), ('StreetNamePostType', 'Street'), ('PlaceName', 'Hicksville'), ('StateName', 'NY'), ('ZipCode', '11801')]), 'Street Address')

Example 2: The Way it is parsing currently

address1 = usaddress.tag("New Park Rd West Hartford CT 16110") pprint(address1) (OrderedDict([('AddressNumber', 'New'), ('StreetName', 'Park'), ('StreetNamePostType', 'Rd'), ('PlaceName', 'West Hartford'), ('StateName', 'CT'), ('ZipCode', '16110')]), 'Street Address')

The Way it needs to be Parsed

address1 = usaddress.tag("New Park Rd West Hartford CT 16110") pprint(address1) (OrderedDict([('StreetName', New Park'), ('StreetNamePostType', 'Rd'), ('PlaceName', 'West Hartford'), ('StateName', 'CT'), ('ZipCode', '16110')]), 'Street Address')

Training xml located at: usaddress/training/double_street_name.xml

Testing xml located at: usaddress/measure_performance/test_data/test_double_street_name.xml