addr_match() works by grouping both the input and the reference addresses by their tagged five digit zipcode and matching within each of these groups.
This approach benefits from reducing RAM usage and increasing overall matching calculation time through early elimination of addresses without exactly matching zipcodes.
To measure similarities between ZIP codes, we could:
create geographic distance similarity matrix of ZCTA centroids
consider a ZIP code to be a regional match if the first three digits match
In either case, we could add an option for addr matching to specify the ZIP code matching level (exact, city, region)
Nearby ZIP codes should only be matched if the exact matching fails to save computation.
would it blowup RAM too much if we matched in groupings of first three digit ZIP codes (split)? instead of exact ZIP code? then, could report matched and actual ZIP code.
addr_match()
works by grouping both the input and the reference addresses by their tagged five digit zipcode and matching within each of these groups.This approach benefits from reducing RAM usage and increasing overall matching calculation time through early elimination of addresses without exactly matching zipcodes.
To measure similarities between ZIP codes, we could:
In either case, we could add an option for addr matching to specify the ZIP code matching level (
exact
,city
,region
)Nearby ZIP codes should only be matched if the exact matching fails to save computation.