ben-domingue / irw

Code related to data for the Item Response Warehouse
https://datapages.github.io/irw/
7 stars 12 forks source link

irtrees #109

Closed ben-domingue closed 3 months ago

ben-domingue commented 5 months ago

Need to triple check that we don't have this data: https://www.jstatsoft.org/article/view/v048c01

KingArthur0205 commented 3 months ago

VerbAgg2.csv VerbAgg3.csv

Working on this. The VerbAgg2 dataset is already processed and included in the repository with title "verbagg" while VerbAgg3 isn't. Probably need to add VerbAgg3 and rename "verbagg" to accomodate both datasets.

Paper also contains 4 additional simulated datasets.(not included and processed as discussed) :)

# https://www.jstatsoft.org/article/view/v048c01

library(dplyr)
library(tidyverse)
library(tidyr)

load("./Data/VerbAgg2.rda")
VerbAgg2_id <- 1:nrow(VerbAgg2)
VerbAgg2 <- cbind(VerbAgg2, id=I(VerbAgg2_id)) # Merge id column into the matrix
VerbAgg2 <- VerbAgg2[, !colnames(VerbAgg2) %in% c("anger", "gender")]
VerbAgg2 <- as.data.frame(VerbAgg2)
VerbAgg2_long <-  pivot_longer(VerbAgg2, cols=-id, names_to='item', values_to='resp')  # Reshape VerbAgg2 data to long format

load("./Data/VerbAgg3.rda")
VerbAgg3_id <- 1:nrow(VerbAgg3)
VerbAgg3 <- cbind(VerbAgg3, id=I(VerbAgg3_id))
VerbAgg3 <- VerbAgg3[, !colnames(VerbAgg3) %in% c("anger", "gender")]
VerbAgg3 <- as.data.frame(VerbAgg3)
VerbAgg3_long <-  pivot_longer(VerbAgg3, cols=-id, names_to='item', values_to='resp')

save(VerbAgg2_long, file="itrees_VerbAgg2.Rdata")
save(VerbAgg3_long, file="itrees_VerbAgg3.Rdata")
write.csv(VerbAgg2_long, "VerbAgg2.csv", row.names = FALSE)
write.csv(VerbAgg3_long, "VerbAgg3.csv", row.names = FALSE)
KingArthur0205 commented 3 months ago

The paper also contains 2 additional datasets:

  1. Differentiate Slow vs. Fast response on Intelligence(fsdatT): Processed fsdatT.csv
load("./fsdatT.rda")
fsdatT <- fsdatT %>% select(-node, -sub)
fsdatT <- fsdatT %>% rename(resp=value, id=person)
fsdatT$id <- sub("^p", "", fsdatT$id) # Convert ids into integers
fsdatT$id <- as.integer(fsdatT$id)
save(fsdatT, file="itrees_fsdatT.Rdata")
write.csv(fsdatT, "fsdatT.csv", row.names=FALSE)
  1. Patients' responses to different sessions of Psychotherapies(stressT):I’m unsure about how to handle the "time" column as shown below, as it indicates different sessions of psychotherapies. Renaming it to "subtest" or "treatment" (from the warehouse perspective) doesn’t seem quite appropriate. Perhaps it’s best to leave it as "time"? stressT screenshot
ben-domingue commented 3 months ago

so these are different observations of the same individual? if so, let's perhaps leave this one as-is for the moment. i need to make a guiding decision about this kind of use case in the coming weeks and think it might be best to return to it at that point.

KingArthur0205 commented 3 months ago

so these are different observations of the same individual? if so, let's perhaps leave this one as-is for the moment. i need to make a guiding decision about this kind of use case in the coming weeks and think it might be best to return to it at that point.

Ye, these are observations of the same participants when they come back for therapies 0, 1, and 2. I will leave it as it is for now.

I will also try to merge the code scripts into one with the above code combined and update to this issue later. Perhaps make a PR later.

Thanks for the clarification :)

KingArthur0205 commented 3 months ago

Complete code file and processed datasets of the paper. I have them in Rdata format but GitHub won't allow me to upload them.... fsdatT.csv stressT.csv VerbAgg2.csv VerbAgg3.csv

# https://www.jstatsoft.org/article/view/v048c01
library(dplyr)
library(tidyverse)
library(tidyr)

load("./stressT.rda")
write.csv(stressT, "stressT.csv", row.names=FALSE)
stressT <- stressT |> 
  select(-exo1, -exo2, -exo3, -exo4, -exo5) |> # Remove columns for decision-tree model
  rename(id=person,
         resp=value,
         item=crossitem)

load("./fsdatT.rda")
fsdatT <- fsdatT %>% select(-node, -sub)
fsdatT <- fsdatT %>% rename(resp=value, id=person)
fsdatT$id <- sub("^p", "", fsdatT$id) # Convert ids into integers
fsdatT$id <- as.integer(fsdatT$id)

load("./VerbAgg2.rda")
VerbAgg2_id <- 1:nrow(VerbAgg2)
VerbAgg2 <- cbind(VerbAgg2, id=I(VerbAgg2_id)) # Merge id column into the matrix
VerbAgg2 <- VerbAgg2[, !colnames(VerbAgg2) %in% c("Anger", "Gender")]
VerbAgg2 <- as.data.frame(VerbAgg2)
VerbAgg2_long <-  pivot_longer(VerbAgg2, cols=-id, names_to='item', values_to='resp')  # Reshape VerbAgg2 data to long format

load("./VerbAgg3.rda")
VerbAgg3_id <- 1:nrow(VerbAgg3)
VerbAgg3 <- cbind(VerbAgg3, id=I(VerbAgg3_id))
VerbAgg3 <- VerbAgg3[, !colnames(VerbAgg3) %in% c("Anger", "Gender")]
VerbAgg3 <- as.data.frame(VerbAgg3)
VerbAgg3_long <-  pivot_longer(VerbAgg3, cols=-id, names_to='item', values_to='resp')

save(fsdatT, file="fsdatT.Rdata")
save(stressT, file="stressT.Rdata")
save(VerbAgg2_long, file="VerbAgg2.Rdata")
save(VerbAgg3_long, file="VerbAgg3.Rdata")
write.csv(fsdatT, "fsdatT.csv", row.names=FALSE)
write.csv(stressT, "stressT.csv", row.names=FALSE)
write.csv(VerbAgg2_long, "VerbAgg2.csv", row.names = FALSE)
write.csv(VerbAgg3_long, "VerbAgg3.csv", row.names = FALSE)
ben-domingue commented 3 months ago

OK let me go through these separately: VerbAgg2.csv VerbAgg3.csv

fsdatT.csv

ben-domingue commented 3 months ago

stressT.csv

ben-domingue commented 3 months ago

Actually, we are fine here. Great work @KingArthur0205 !! See https://github.com/ben-domingue/irw/blob/main/data/IRTrees.R