Watts-College / paf-516-template

https://watts-college.github.io/paf-516-template/
0 stars 0 forks source link

Mod 4 data dictionary #8

Open Rachel-Judd opened 11 months ago

Rachel-Judd commented 11 months ago

Hi, @antjam-howell For module 4, to create the initial groups in part 1, should I have added the dictionary? I tried running it without including the data.dictionary assignment included in part 2, but it was saying data.dictionary was not found. When I include the data.dictionary chunk, the groups seem incorrect. "Black, non-hispanic" is listed twice and there are many conflicting results. Did anyone else come across this?

image

AntJam-Howell commented 11 months ago

Did you have?

data.dictonary <- data.frame(
  LABEL = c(
    "tractid", "pnhwht12", "pnhblk12", "phisp12", "pntv12", "pfb12",
    "polang12", "phs12", "pcol12", "punemp12", "pflabf12", "pprof12",
    "pmanuf12", "pvet12", "psemp12", "hinc12", "incpc12", "ppov12",
    "pown12", "pvac12", "pmulti12", "mrent12", "mhmval12", "p30old12",
    "p10yrs12", "p18und12", "p60up12", "p75up12", "pmar12", "pwds12",
    "pfhh12"
  ),
  VARIABLE = c(
    "GEOID", "Percent white, non-Hispanic", "Percent black, non-Hispanic",
    "Percent Hispanic", "Percent Native American race", "Percent foreign born",
    "Percent speaking other language at home, age 5 plus",
    "Percent with high school degree or less", "Percent with 4-year college degree or more",
    "Percent unemployed", "Percent female labor force participation",
    "Percent professional employees", "Percent manufacturing employees",
    "Percent veteran", "Percent self-employed", "Median HH income, total",
    "Per capita income", "Percent in poverty, total", "Percent owner-occupied units",
    "Percent vacant units", "Percent multi-family units", "Median rent",
    "Median home value", "Percent structures more than 30 years old",
    "Percent HH in neighborhood 10 years or less", "Percent 17 and under, total",
    "Percent 60 and older, total", "Percent 75 and older, total",
    "Percent currently married, not separated", "Percent widowed, divorced and separated",
    "Percent female-headed families with children"
  )
)

    df.pct <- sapply( d2, ntile, 100 )
d4 <- as.data.frame( df.pct )
d4$cluster <- as.factor( paste0("GROUP-",fit$classification) )

num.groups <- length( unique( fit$classification ) )

stats <- 
  d4 %>% 
  group_by( cluster ) %>% 
  summarise_each( funs(mean) )

t <- data.frame( t(stats), stringsAsFactors=F )
names(t) <- paste0( "GROUP.", 1:num.groups )
t <- t[-1,]

for( i in 1:num.groups )
{
  z <- t[,i]
  plot( rep(1,30), 1:30, bty="n", xlim=c(-75,100), 
        type="n", xaxt="n", yaxt="n",
        xlab="Percentile", ylab="",
        main=paste("GROUP",i) )
  abline( v=seq(0,100,25), lty=3, lwd=1.5, col="gray90" )
  segments( y0=1:30, x0=0, x1=100, col="gray70", lwd=2 )
  text( -0.2, 1:30, data.dictionary$VARIABLE[-1], cex=0.85, pos=2 )
  points( z, 1:30, pch=19, col="firebrick", cex=1.5 )
  axis( side=1, at=c(0,50,100), col.axis="gray", col="gray" )
}
Rachel-Judd commented 10 months ago

Thank you. Yes, when I inlclude everything above the results are still off.