ben-domingue / irw

Code related to data for the Item Response Warehouse
https://datapages.github.io/irw/
6 stars 8 forks source link

Psychometric Properties of the Multifaceted Gender-Related Attributes Survey (GERAS) #199

Closed ben-domingue closed 2 weeks ago

ben-domingue commented 2 weeks ago

Data and R scripts are published in the Open Science Framework (see https://osf.io/42jhr/).

https://econtent.hogrefe.com/doi/10.1027/1015-5759/a000528

KingArthur0205 commented 2 weeks ago

This paper includes 2 studies. The 1st study includes 2 datasets named Study1_CEA and Study1_EFA, corresponding to the data collected for exploratory factor analysis and confirmatory factor analysis. The 2 datasets used the same measure(set of items) and is thus merged together.

Study 2 includes a dataset with identical items. However, it contains repetitive IDs for different participants. For example, ID 1 is used 5 times. I am modifying the IDs to be id_age_gender to uniquely identify each participant. @ben-domingue Please let me know if this assumption is correct. ;)

Both studies include 3 subscales: personality, cognition, and activities. I have separated them into 3 different dfs for both datasets.

KingArthur0205 commented 2 weeks ago

Data: GERAS_Gruber_2019.csv

Code:

# Paper:https://econtent.hogrefe.com/doi/10.1027/1015-5759/a000528
# Data:https://osf.io/42jhr/
library(dplyr)
library(tidyr)
library(haven)

# ------ Process Study 1 -------
study1_cfa_df <- read_sav("./GERAS_Study1_CFA.sav")
study1_efa_df <- read_sav("./GERAS_Study1_EFA.sav")

study1_df <- rbind(study1_cfa_df, study1_efa_df) # Merge 2 datasets
study1_df <- study1_df |>
  select(-gender) |>
  rename(id=ID)
study1_df <- study1_df %>% # Replace encoded missing values with NA
  mutate_all(~replace(., . %in% c(-66, -77, -99), NA))

# ------ Process Study 2 -------
study2_df <- read_sav("./GERAS_Study2_CFA.sav")

colnames(study2_df) <- gsub("\\s*\\(.*\\)", "", colnames(study2_df))
study2_df <- lapply(study2_df, function(x) { attr(x, "label") <- NULL; x })
study2_df <- as.data.frame(study2_df)

study2_df <- study2_df %>%
  mutate(VPN = paste(VPN, gender, age, sep = "_"))
study2_df <- study2_df |>
  select(-gender) |>
  rename(id=VPN)
study2_df <- study2_df %>% # Replace encoded missing values with NA
  mutate_all(~replace(., . %in% c(-66, -77, -99), NA))

# ------ Process Merged Data ------
study1_df$id <- as.character(study1_df$id)
merged_df <- bind_rows(
  study1_df %>% mutate(group = "Study 1"), 
  study2_df %>% mutate(group = "Study 2")) # Merge datasets from the 2 studies
pivot_longer(merged_df, cols=-c(id, age, group), names_to="item", values_to = "resp")

save(merged_df, file="GERAS_Gruber_2019.Rdata")
write.csv(merged_df, "GERAS_Gruber_2019.csv", row.names=FALSE)
ben-domingue commented 2 weeks ago

a few questions/notes:

KingArthur0205 commented 2 weeks ago
  • all three have 1913 IDs which i'm guessing is more or less the sum of studies 1 and 2 (1466+471 is a little more than 1913 but that's ok). if we're on the same page i think i'm ok with your solution.

I think the total No. of participants is correct. However, there are repetitive IDs in the 2 studies.(and multiple repetitive IDs in Study 2 alone) To avoid this, I encoded the participants' IDs in Study 2 to be id_age_gender and added a group column to differentiate repetitive IDs between Study 1 and Study 2

No  of Participants
  • the three subscales are all part of the same GERAS measure i think. is that right? if so, i would put them together. when to split and when to lump is more science than art but i think here we want to lump. to give an example, if we had an academic test with math and reading, i'd want to split. if it was a math test with algebra and geometry, i'd want to lump. if all 3 subscales are assessing the same gender attitudes construct, i'd lump (but perhaps have info on the subscales [maybe in item names?]).

Yes, now I understand that we don't split all datasets. This is a good counter-example. :)

The code and datasets are updated above. :)

KingArthur0205 commented 2 weeks ago

PR for this usse: https://github.com/ben-domingue/irw/pull/203

ben-domingue commented 2 weeks ago

i think this CSV got output as 'wide' rather than 'long'. most of the columns must be the items, yeah?

ben-domingue commented 2 weeks ago

@KingArthur0205 i think this requires a tweak

KingArthur0205 commented 2 weeks ago

This is an oversight on myend. I should have double-checked more carefully. Sorry for the mistake.

I have updated the cell above, the CSV file, and the PR to have long format.

ben-domingue commented 2 weeks ago

oh no worries! honestly, finding the occasional error makes me feel like i'm adding value! ;)

ben-domingue commented 2 weeks ago

the coding is 2-6. double checking that was true in the original data. it's fine if so just wanted to be sure we didn't lose the 1 values. @KingArthur0205

KingArthur0205 commented 2 weeks ago

the coding is 2-6. double checking that was true in the original data. it's fine if so just wanted to be sure we didn't lose the 1 values. @KingArthur0205

Ye, I just double checked the original datasets and the paper. The study adopted a 7-point scale, and the original, unprocessed datasets didn't have any 1s or 7s. They didn't seem to apply any techniques to remove 1s and 7s either.