Watts-College / cpp-529-spr-2022

https://watts-college.github.io/cpp-529-spr-2022/
0 stars 2 forks source link

Trouble with variable selection #27

Closed ramacdo1 closed 2 years ago

ramacdo1 commented 2 years ago

I am trying to add some more variables to the "lab 5" portion of the dataset creation, but I am running into an error where it says it can't select columns that don't exist, but they do exist in the dataset. It is almost as if when the d1 & d2 & md merged, it masked variables from d2. Any advice?

URL1 <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-2000.rds"
d1 <- readRDS( gzcon( url( URL1 ) ) )

URL2 <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-2010.rds"
d2 <- readRDS( gzcon( url( URL2 ) ) )

URLmd <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-META-DATA.rds"
md <- readRDS( gzcon( url( URLmd ) ) )

d1 <- select( d1, - year )
d2 <- select( d2, - year )

d <- merge( d1, d2, by="tractid" )
d <- merge( d, md, by="tractid" )

# STANDARDIZE GEO IDs

# note the current geoid format for the LTDB census data: 
# FIPS-STATE-COUNTY-TRACT:  fips-01-001-020100  

x <- d$tractid 
head( x )
# [1] "fips-01-001-020100" "fips-01-001-020200" "fips-01-001-020300"
# [4] "fips-01-001-020400" "fips-01-001-020500" "fips-01-001-020600"
# remove non-numeric strings 
x <- gsub( "fips", "", x )
x <- gsub( "-", "", x )
head( x )
# [1] "01001020100" "01001020200" "01001020300" "01001020400" "01001020500"
# [6] "01001020600"
# drop leading zeros 
x <- as.numeric( x )

# remember to add the variable back to the census dataset
d$tractid2 <- x 

las.map <- merge( las_dorling, d, by.x="GEOID", by.y="tractid", all.x=T )
d3 <- select( d, tractid, 
             mhmval00, mhmval12, 
             hinc00, 
             hu00, vac00, own00, rent00, h30old00,
             empclf00, clf00, unemp00, prof00,  
             dpov00, npov00,
             ag25up00, hs00, col00, 
             pop00.x, nhwht00, nhblk00, hisp00, asian00,
             cbsa, cbsaname, 
             pnhwht12, pnhblack12, phisp12, pntv12, 
             pfb12, polang12, phs12, pcol12, 
             punemp12, pflabf12, pprof12, pmanuf12, 
             pvet12, psemp12, 
             hinc12, incpc12, ppov12, pown12, pvac12, pmulti12, 
             mrent12, p30old12, p10yrs12, 
             p18und12, p60up12, p75up12, 
             pmar12, pwds12, pfhh12 )

Above is the code I am trying to run, this is the error when I run the last portion (the creation of d3): Error: Can't subset columns that don't exist. x Column pnhblack12 doesn't exist.

JasonSills commented 2 years ago

@ramacdo1

pnhblack12 doesn't exist, I think you are looking for pnhblk12.

ramacdo1 commented 2 years ago

Oh perfect, thank you!