irtrees - Githubissues

ben-domingue commented 5 months ago

Need to triple check that we don't have this data: https://www.jstatsoft.org/article/view/v048c01

KingArthur0205 commented 3 months ago

Working on this. The VerbAgg2 dataset is already processed and included in the repository with title "verbagg" while VerbAgg3 isn't. Probably need to add VerbAgg3 and rename "verbagg" to accomodate both datasets.

Paper also contains 4 additional simulated datasets.(not included and processed as discussed) :)

# https://www.jstatsoft.org/article/view/v048c01

library(dplyr)
library(tidyverse)
library(tidyr)

load("./Data/VerbAgg2.rda")
VerbAgg2_id <- 1:nrow(VerbAgg2)
VerbAgg2 <- cbind(VerbAgg2, id=I(VerbAgg2_id)) # Merge id column into the matrix
VerbAgg2 <- VerbAgg2[, !colnames(VerbAgg2) %in% c("anger", "gender")]
VerbAgg2 <- as.data.frame(VerbAgg2)
VerbAgg2_long <-  pivot_longer(VerbAgg2, cols=-id, names_to='item', values_to='resp')  # Reshape VerbAgg2 data to long format

load("./Data/VerbAgg3.rda")
VerbAgg3_id <- 1:nrow(VerbAgg3)
VerbAgg3 <- cbind(VerbAgg3, id=I(VerbAgg3_id))
VerbAgg3 <- VerbAgg3[, !colnames(VerbAgg3) %in% c("anger", "gender")]
VerbAgg3 <- as.data.frame(VerbAgg3)
VerbAgg3_long <-  pivot_longer(VerbAgg3, cols=-id, names_to='item', values_to='resp')

save(VerbAgg2_long, file="itrees_VerbAgg2.Rdata")
save(VerbAgg3_long, file="itrees_VerbAgg3.Rdata")
write.csv(VerbAgg2_long, "VerbAgg2.csv", row.names = FALSE)
write.csv(VerbAgg3_long, "VerbAgg3.csv", row.names = FALSE)

KingArthur0205 commented 3 months ago

The paper also contains 2 additional datasets:

Differentiate Slow vs. Fast response on Intelligence(fsdatT): Processed fsdatT.csv

load("./fsdatT.rda")
fsdatT <- fsdatT %>% select(-node, -sub)
fsdatT <- fsdatT %>% rename(resp=value, id=person)
fsdatT$id <- sub("^p", "", fsdatT$id) # Convert ids into integers
fsdatT$id <- as.integer(fsdatT$id)
save(fsdatT, file="itrees_fsdatT.Rdata")
write.csv(fsdatT, "fsdatT.csv", row.names=FALSE)

Patients' responses to different sessions of Psychotherapies(stressT):I’m unsure about how to handle the "time" column as shown below, as it indicates different sessions of psychotherapies. Renaming it to "subtest" or "treatment" (from the warehouse perspective) doesn’t seem quite appropriate. Perhaps it’s best to leave it as "time"?

ben-domingue commented 3 months ago

so these are different observations of the same individual? if so, let's perhaps leave this one as-is for the moment. i need to make a guiding decision about this kind of use case in the coming weeks and think it might be best to return to it at that point.

KingArthur0205 commented 3 months ago

so these are different observations of the same individual? if so, let's perhaps leave this one as-is for the moment. i need to make a guiding decision about this kind of use case in the coming weeks and think it might be best to return to it at that point.

Ye, these are observations of the same participants when they come back for therapies 0, 1, and 2. I will leave it as it is for now.

I will also try to merge the code scripts into one with the above code combined and update to this issue later. Perhaps make a PR later.

Thanks for the clarification :)

KingArthur0205 commented 3 months ago

Complete code file and processed datasets of the paper. I have them in Rdata format but GitHub won't allow me to upload them.... fsdatT.csv stressT.csv VerbAgg2.csv VerbAgg3.csv

# https://www.jstatsoft.org/article/view/v048c01
library(dplyr)
library(tidyverse)
library(tidyr)

load("./stressT.rda")
write.csv(stressT, "stressT.csv", row.names=FALSE)
stressT <- stressT |> 
  select(-exo1, -exo2, -exo3, -exo4, -exo5) |> # Remove columns for decision-tree model
  rename(id=person,
         resp=value,
         item=crossitem)

load("./fsdatT.rda")
fsdatT <- fsdatT %>% select(-node, -sub)
fsdatT <- fsdatT %>% rename(resp=value, id=person)
fsdatT$id <- sub("^p", "", fsdatT$id) # Convert ids into integers
fsdatT$id <- as.integer(fsdatT$id)

load("./VerbAgg2.rda")
VerbAgg2_id <- 1:nrow(VerbAgg2)
VerbAgg2 <- cbind(VerbAgg2, id=I(VerbAgg2_id)) # Merge id column into the matrix
VerbAgg2 <- VerbAgg2[, !colnames(VerbAgg2) %in% c("Anger", "Gender")]
VerbAgg2 <- as.data.frame(VerbAgg2)
VerbAgg2_long <-  pivot_longer(VerbAgg2, cols=-id, names_to='item', values_to='resp')  # Reshape VerbAgg2 data to long format

load("./VerbAgg3.rda")
VerbAgg3_id <- 1:nrow(VerbAgg3)
VerbAgg3 <- cbind(VerbAgg3, id=I(VerbAgg3_id))
VerbAgg3 <- VerbAgg3[, !colnames(VerbAgg3) %in% c("Anger", "Gender")]
VerbAgg3 <- as.data.frame(VerbAgg3)
VerbAgg3_long <-  pivot_longer(VerbAgg3, cols=-id, names_to='item', values_to='resp')

save(fsdatT, file="fsdatT.Rdata")
save(stressT, file="stressT.Rdata")
save(VerbAgg2_long, file="VerbAgg2.Rdata")
save(VerbAgg3_long, file="VerbAgg3.Rdata")
write.csv(fsdatT, "fsdatT.csv", row.names=FALSE)
write.csv(stressT, "stressT.csv", row.names=FALSE)
write.csv(VerbAgg2_long, "VerbAgg2.csv", row.names = FALSE)
write.csv(VerbAgg3_long, "VerbAgg3.csv", row.names = FALSE)

ben-domingue commented 3 months ago

OK let me go through these separately: VerbAgg2.csv VerbAgg3.csv

I think we actually can get rid of both of these. They are duplicates of each other it seems and also data we already have (https://redivis.com/datasets/as2e-cv7jb41fd/tables/38n6-c5nr31epg). Can you update code to remove this bit?

fsdatT.csv

I think we just have the Raven's? Or, at least, there are just 35 items. This is confusing given the way that they suggest they'd have both on the website (it seems like they would have both) but, if you agree, I'll just document it as data from the Raven's tasks.

ben-domingue commented 3 months ago

stressT.csv

this one looks good

ben-domingue commented 3 months ago

Actually, we are fine here. Great work @KingArthur0205 !! See https://github.com/ben-domingue/irw/blob/main/data/IRTrees.R

ben-domingue / irw

irtrees #109