Kgable / Soc-413-Research-Project

MIT License
1 stars 0 forks source link

Cleaning Data #5

Closed Kgable closed 1 year ago

Kgable commented 1 year ago

Good morning @AaronGullickson

I hope all is well. I apologize for potentially blowing up your email. I was trying to send messages yesterday and this morning and forgot to link you and then the format wasn't readable on my end. I am retyping here for clarity. Again, I apologize for any issues it may have caused.

I dropped the belief in afterlife variable due to the lack of data. I took the three remaining variables for religiosity (belief in god, religious attendance, and preyer frequency) and first, I saved them as new variable and collapsed and converted them into qualitative variables. Then, I began standardizing them. When I did this, I noticed my means are now 0, but the standard deviations are not 1. In some cases they vary quite a bit away from 1. Any idea why this occurs or how to fix it?

I intend to use factorial analysis to create my religiosity variable as a numeric. I have included that code, along with code checking the loading factors, but I have not tested that code yet. I wanted to make sure my standardizations is done correctly first. My intention behind keeping the religiosity measures as both a collective numerical scale and individual qualitative variables is to expand the options for analysis in the next section. I think I am really close to done with building my analytical dataset. As always, I have committed and pushed all coding up to git. You should be able to see it.

Thank you, Kevin Gable

AaronGullickson commented 1 year ago

They look fine to me. Remember that the summary command is not giving you SD information.

gss$temp <- standardize(gss$attend)
mean(gss$temp, na.rm=TRUE)
sd(gss$temp, na.rm=TRUE)

That produces a mean of 0 and an sd of 1.

I did a correlation matrix on your three variables and it looks like pray is reverse coded:

cor(gss[, c("attend", "pray", "god")], use = "pairwise.complete")
           attend       pray        god
attend  1.0000000 -0.5340731  0.4480790
pray   -0.5340731  1.0000000 -0.6115003
god     0.4480790 -0.6115003  1.0000000

You can see that in the factor loadings as well:

loadings(relfact)

Loadings:
       MR1   
attend  0.626
pray   -0.854
god     0.716

                 MR1
SS loadings    1.633
Proportion Var 0.544

Going back to your categorical coding of the pray variable, I can see that this is true. The highest category of 6 is "never" and the lowest category of 1 is "more than daily." Here is some easy code to reverse that:

gss$pray <- 7-gss$pray

Just remember to code that after you categorize prayer frequency.

After I fix that reverse coding, I run alpha(gss[, c("attend", "pray", "god")]) and get an alpha of 0.78 (std.alpha tells you how high it would be if all variables were standardized- or you can standardize first and then run it). Thats pretty decent and justifies using a simple scoring approach for your DV. I would double standardize to get that score, ie:

gss <- gss |>
  mutate(religiosity=standardize(standardize(attend)+standardize(pray)+standardize(god)))