geomarker-io / addr

Clean, Parse, Harmonize, Match, and Geocode Messy Real-World Addresses
https://geomarker.io/addr/
Other
2 stars 0 forks source link

search "nearby" zipcodes for addresses without matches in exactly matched zipcode addresses #26

Open cole-brokamp opened 1 week ago

cole-brokamp commented 1 week ago

addr_match() works by grouping both the input and the reference addresses by their tagged five digit zipcode and matching within each of these groups.
This approach benefits from reducing RAM usage and increasing overall matching calculation time through early elimination of addresses without exactly matching zipcodes.

To measure similarities between ZIP codes, we could:

  1. create geographic distance similarity matrix of ZCTA centroids
  2. consider a ZIP code to be a regional match if the first three digits match

In either case, we could add an option for addr matching to specify the ZIP code matching level (exact, city, region)

Nearby ZIP codes should only be matched if the exact matching fails to save computation.

cole-brokamp commented 1 week ago

would it blowup RAM too much if we matched in groupings of first three digit ZIP codes (split)? instead of exact ZIP code? then, could report matched and actual ZIP code.