CityOfLosAngeles / data-workflows-101

A workshop and training for data workflows that we use at the City.
Apache License 2.0
4 stars 9 forks source link

Resolve address record inconsistencies #16

Open jannasmith opened 6 years ago

jannasmith commented 6 years ago

Addresses and intersections entered into records from written reports use inconsistent naming conventions, such as spelling "Avenue" instead of writing "Ave." or abbreviating "MLK Blvd" instead of spelling out "Martin Luther King Jr Blvd", creating separate categories for records that should be combined into the same record.

igotcharts commented 6 years ago

I think this can be solved using a fuzzy matching algorithm. Do we have a master list of "true" street names?

How incorrect are some of these names? Is it just a matter of "ave" vs" avenue" or is it possible we would see something like "Maryin Lugher King Street"?

jannasmith commented 6 years ago

Yes we can designate a master list of official street names. Very rarely do we have major misspellings, but there are some common mistakes that we can identify