Cleaning Data - Githubissues

Kgable / Soc-413-Research-Project

MIT License

1 stars 0 forks source link

Good morning @AaronGullickson

I hope all is well. I apologize for potentially blowing up your email. I was trying to send messages yesterday and this morning and forgot to link you and then the format wasn't readable on my end. I am retyping here for clarity. Again, I apologize for any issues it may have caused.

I dropped the belief in afterlife variable due to the lack of data. I took the three remaining variables for religiosity (belief in god, religious attendance, and preyer frequency) and first, I saved them as new variable and collapsed and converted them into qualitative variables. Then, I began standardizing them. When I did this, I noticed my means are now 0, but the standard deviations are not 1. In some cases they vary quite a bit away from 1. Any idea why this occurs or how to fix it?

I intend to use factorial analysis to create my religiosity variable as a numeric. I have included that code, along with code checking the loading factors, but I have not tested that code yet. I wanted to make sure my standardizations is done correctly first. My intention behind keeping the religiosity measures as both a collective numerical scale and individual qualitative variables is to expand the options for analysis in the next section. I think I am really close to done with building my analytical dataset. As always, I have committed and pushed all coding up to git. You should be able to see it.

Thank you, Kevin Gable

They look fine to me. Remember that the summary command is not giving you SD information.

gss$temp <- standardize(gss$attend)
mean(gss$temp, na.rm=TRUE)
sd(gss$temp, na.rm=TRUE)

That produces a mean of 0 and an sd of 1.

I did a correlation matrix on your three variables and it looks like pray is reverse coded:

cor(gss[, c("attend", "pray", "god")], use = "pairwise.complete")
           attend       pray        god
attend  1.0000000 -0.5340731  0.4480790
pray   -0.5340731  1.0000000 -0.6115003
god     0.4480790 -0.6115003  1.0000000

You can see that in the factor loadings as well:

loadings(relfact)

Loadings:
       MR1   
attend  0.626
pray   -0.854
god     0.716

                 MR1
SS loadings    1.633
Proportion Var 0.544

Going back to your categorical coding of the pray variable, I can see that this is true. The highest category of 6 is "never" and the lowest category of 1 is "more than daily." Here is some easy code to reverse that:

gss$pray <- 7-gss$pray

Just remember to code that after you categorize prayer frequency.

After I fix that reverse coding, I run alpha(gss[, c("attend", "pray", "god")]) and get an alpha of 0.78 (std.alpha tells you how high it would be if all variables were standardized- or you can standardize first and then run it). Thats pretty decent and justifies using a simple scoring approach for your DV. I would double standardize to get that score, ie:

gss <- gss |>
  mutate(religiosity=standardize(standardize(attend)+standardize(pray)+standardize(god)))

Kgable / Soc-413-Research-Project

Cleaning Data #5