datamade / usaddress

:us: a python library for parsing unstructured United States address strings into address components
https://parserator.datamade.us/usaddress
MIT License
1.52k stars 304 forks source link

Parse addresses with 'County Road xx' names #251

Open rptetzloff opened 5 years ago

rptetzloff commented 5 years ago

Training for multi-letter County Road names, e.g. "County Road XX."

"County Road D" was parsing correctly:

  >>> address1 = usaddress.tag("1234 COUNTY ROAD D, FRANKLIN, WI 54567")
  >>> pprint(address1)
  (OrderedDict([('AddressNumber', '1234'),
                ('StreetNamePreType', 'COUNTY ROAD'),
                ('StreetName', 'D'),
                ('PlaceName', 'FRANKLIN'),
                ('StateName', 'WI'),
                ('ZipCode', '54567')]),
   'Street Address')

"County Road DD" was not parsing correctly:

  >>> address2 = usaddress.tag("1234 COUNTY ROAD DD, FRANKLIN, WI 54567")
  >>> pprint(address2)
  (OrderedDict([('AddressNumber', '1234'),
                ('StreetName', 'COUNTY ROAD'),
                ('StreetNamePostType', 'DD'),
                ('PlaceName', 'FRANKLIN'),
                ('StateName', 'WI'),
                ('ZipCode', '54567')]),
   'Street Address')

Training xml located at:

  usaddress/training/county_road_xx.xml

Testing xml located at:

  usaddress/measure_performance/test_data/county_road_xx.xml

Each contained 6 addresses.

All tests passed.

> nosetests . 
<snip>
----------------------------------------------------------------------
Ran 4929 tests in 1.329s

OK