TCohen3764 / QNT745GoCigna

U o H, QNT 745, Group Project, Go Cigna Group
2 stars 0 forks source link

First pull out the data from NHANES. I'm including Gender in case we want to do something with it later.

data1 <- NHANES[1:10000,c('DaysMentHlthBad','BMI','Gender')]

Remove records where there was zero bad mental health days reported

library(dplyr) data1 <- subset(data1, DaysMentHlthBad > 0)

View a summary of your data to verify it looks right

summary(data1)

Notice there are no NAs for DaysMentHlthBad because we just pulled out only records with >0 DaysMentHlthBad

Now lets impute for BMI so that we don't have any NA values in BMI

imp<-mice(data1,m=5)

And now lets complete the imputation and save it as impdata1

impdata1<-complete(imp)

Lets verify by pulling a summary of impdata1

summary(impdata1)

Notice no more NAs

Lets run a plot and see what it looks like

ggplot(impdata1, aes(x = DaysMentHlthBad, y = BMI)) + geom_bin2d()

It appears that due to some outlier BMI values the plot is hard to read

Lets try using some bins on the BMI variable to see if that helps the plot

impdata1 <- mutate(impdata1, BMI_Group = cut(BMI, breaks = c(0, 20, 25, 30, 35, 40, 100)))

Now lest rerun the plot using BMI_Group

ggplot(impdata1, aes(x = DaysMentHlthBad, y = BMI_Group)) + geom_bin2d()

That looks better

Now we just need to clean up the visual