geomarker-io / addr

Clean, Parse, Harmonize, Match, and Geocode Messy Real-World Addresses
https://geomarker.io/addr/
Other
2 stars 0 forks source link

cagis_parcel_id list col #11

Closed erikarasnick closed 4 months ago

erikarasnick commented 4 months ago

Some address in cagis_addr have multiple parcel ids per line. See example below.

addr::cagis_addr |> 
   dplyr::filter(cagis_address == "1549 MEREDITH DR SPRINGFIELD TOWNSHIP, OH 45231") |> 
   dplyr::select(cagis_address, cagis_parcel_id)
#> # A tibble: 4 × 2
#>   cagis_address                                   cagis_parcel_id
#>   <chr>                                           <list>         
#> 1 1549 MEREDITH DR SPRINGFIELD TOWNSHIP, OH 45231 <chr [31]>     
#> 2 1549 MEREDITH DR SPRINGFIELD TOWNSHIP, OH 45231 <chr [1]>      
#> 3 1549 MEREDITH DR SPRINGFIELD TOWNSHIP, OH 45231 <chr [1]>      
#> 4 1549 MEREDITH DR SPRINGFIELD TOWNSHIP, OH 45231 <chr [1]>

Created on 2024-07-05 with reprex v2.1.0

cole-brokamp commented 4 months ago

enforce matching addresses in cagis to be unique?

cole-brokamp commented 4 months ago

Here, the CAGIS data is nested by all used variables except for the parcel ID: https://github.com/cole-brokamp/addr/blob/65830c64873a65de2dd54de856c58a982549e911/data-raw/make_cagis_addr.R#L40-L44

So, if the CAGIS address record differs on one of these features (cagis_addr, cagis_address, cagis_address_place, cagis_is_condo, cagis_address_type, cagis_s2), then it will result in more than one record for the same address (but still have distinct parcel identifiers).

The example above have different geographic coordinates (and thus s2 cell identifiers) and different "place names", so they have unique rows in the address dataset.