DS4PS / cpp-529-fall-2020

http://ds4ps.org/cpp-529-fall-2020/
0 stars 0 forks source link

Lab 5 - Matching columns in the map #25

Open JasonSills opened 3 years ago

JasonSills commented 3 years ago

Hi @lecy ,

I'm having difficulty syncing the GEOID and tractID to construct my map. I see that the two are not the same format. It looks like the GEOID in the data I used to construct my map is FIPs, but all numeric. In the tractID it's standard FIPs. I'm assuming this is my problem and why the two will not merge, so I've tried to pull out the numeric values of the tractID with the following regex. The problem is that I cannot seem to get rid of the "-". Can you tell me what I'm missing out on? Or if there is a better way to sync the two data sets?

regexp1 <- sub('.*-([0-9]+).*?','\\1',d$tractid)
d$geoid1 <- str_extract(d$tractid, regexp1)
d$geoid1

image

JasonSills commented 3 years ago

Okay, I got scrappy and did this, but I'd love to know what regex would pull this out without the additional steps below.

regexp1 <- sub('^[^-]*-([0-9]+).*?','\\1',d$tractid)
d$geoid1 <- str_extract(d$tractid, regexp1)
step1 <- substr(d$geoid1, start = 1, stop = 2)
step2 <- substr(d$geoid1, start = 4, stop = 6)
step3 <- substr(d$geoid1, start = 8, stop = 13)
d$geoid <- paste0(step1,step2,step3)
lecy commented 3 years ago

How about something like:

x <- head( d$tractid )
x
[1] "fips-01-001-020100" "fips-01-001-020200" "fips-01-001-020300"
[4] "fips-01-001-020400" "fips-01-001-020500" "fips-01-001-020600"

x <- gsub( "fips", "", x )
x <- gsub( "-", "", x )

x
[1] "01001020100" "01001020200" "01001020300" "01001020400" "01001020500"
[6] "01001020600"

Or more succinctly, use the regular expression for "remove everything but numbers":

x <- head( d$tractid )
gsub( "[^0-9]", "",  x )   # replace not numbers 
[1] "01001020100" "01001020200" "01001020300" "01001020400" "01001020500"
[6] "01001020600"
lecy commented 3 years ago

Remember to convert both IDs to numbers to avoid the leading zero problem!