PE-DDM notebook - Githubissues

ostibolt commented 2 years ago

A place to detail project progress and changes

ostibolt commented 2 years ago

Flanker files with ENG and SPN names

Flanker files have the following naming convention: NIH_FlankerABMT[sub number]_[session number], ex. NIH_Flanker_ABMT-5273-1
Some flanker files had ENG or SPN instead of ABMT. This is based off of the language the participating parents speak. There was some suspicion that the files with child sub numbers (5000s) and ENG or SPN were actually misnamed parent files (both child and parent completed the flanker task).
To resolve whether they were children or parent files, I looked through the paper files for each subject.
View spreadsheet "PRE_dataset_organziation" for full comments for looking through paper files

Breakdown

71 total "children" files had ENG or SPN (or "net", but only one)
46 were confirmed to be children files
12 were definitely parents or didn't have the paper files to make a confirmation in either direction
12 I have questions about or need further action (like there were two separate child files, and need to confirm one over the other is included)

Next steps

Need to make sure that the files that were run thru DDM in July are all children, as some subjects had two differently named files (ex. one ABMT, one SPN). Want to make sure that the assumptions we made about which to include were correct and that I was aware that there were duplicate files for some
Run the files that were excluded the first time around for potentially being parents but were confirmed children after looking at the paper files.

ostibolt commented 2 years ago

Analyses based on congruency

George and I had a discussion about including only incongruent trials in our analyses, as so few trials have error responses and even fewer trials were both error and congruent (incongruent is harder, so more likely to make errors). If we included both, then that could introduce bias into the data since congruent and incongruent conditions are not equally represented (in fact, drastically different in their representation).

Question: if DDM is looping through congruent and incongruent, why is it not producing four .csvs (post-error congruent, post-error incongruent, post-correct congruent, post-correct incongruent)?

Answer: It doesn't produce four .csvs for the same reason it doesn't produce four .csvs based on accuracy (current trial correct/incorrect)- the code is modeling those trials separately and then creating a fit that encompasses each congruency and accuracy condition for post-error and for post-correct.

Question: If we are interested in "only including incongruent trials", then where does that leave us with the way the data is currently being fit?

Can't separate out congruency conditions from DDM outputs. So I think we need to change the modeling script.
Do we need to make a separate congruency condition to loop over like post-trial accuracy (post-error and post-correct conditions), therefore making 4 different .csv files?
If that is the case, then are we interested in previous trial congruency (post-congruent and post-incongruent) or congruency of the post-error trial?

ostibolt commented 2 years ago

Analyses based on congruency

George and I had a discussion about including only incongruent trials in our analyses, as so few trials have error responses and even fewer trials were both error and congruent (incongruent is harder, so more likely to make errors). If we included both, then that could introduce bias into the data since congruent and incongruent conditions are not equally represented (in fact, drastically different in their representation).

Question: if DDM is looping through congruent and incongruent, why is it not producing four .csvs (post-error congruent, post-error incongruent, post-correct congruent, post-correct incongruent)?

Answer: It doesn't produce four .csvs for the same reason it doesn't produce four .csvs based on accuracy (current trial correct/incorrect)- the code is modeling those trials separately and then creating a fit that encompasses each congruency and accuracy condition for post-error and for post-correct.

Question: If we are interested in "only including incongruent trials", then where does that leave us with the way the data is currently being fit?

Can't separate out congruency conditions from DDM outputs. So I think we need to change the modeling script.

Do we need to make a separate congruency condition to loop over like post-trial accuracy (post-error and post-correct conditions), therefore making 4 different .csv files?

If that is the case, then are we interested in previous trial congruency (post-congruent and post-incongruent) or congruency of the post-error trial?

Update

After further discussion with George, I understand now that the changes don't need to happen with the DDM modeling/fitting script, they need to happen with the cleaning script.

We have n trials and n+1 (post-error/post-correct) trials. We want to retain all congruencies for n+1, but only examine n trials that are incongruent. In other words, we want to only look at n+1 trials where n is incongruent.

I already have a “priorCongruency” column in the cleaning script (multiSubImport_cleanAccRT_v6-0.R). I just need to exclude any trials where priorCongruency =1 (congruent).

So now, the data to be processed thru DDM is a subset of data that excludes when…

RT for trial before was too fast (priorBad)
Current trial RT was too fast (currentBad)
Current trial is the first of the block or of the expt (currentBad)
Trial before was congruent (priorCongruency)- we only want to include n trials that are incongruent (2)

ostibolt commented 2 years ago

A note about collecting past treatment history

In the sheet I'm using to pull past treatment (tx) history, there are the following variables: prediagnosis child, prediagnosis parent, prediagnosis composite, and prediagnosis final. I was not sure which diagnosis variable (and accompanying tx variables) I should use, so I asked Jeremy. Here is his reply:

"I recommend using the final dx. Composite dx means the combination of all diagnoses endorsed by parent and child during the ADIS. Final dx means all the diagnoses we arrived at during our clinical meeting in which we review parent and child responses to the ADIS. There will be a lot of overlap in composite and final, but sometimes our diagnoses do not perfectly match what was endorsed by a child or parent or we assign interference ratings that differ from those provided by child or parent."

Thus, I'll be pulling the "prediagnosis final" variables.

ostibolt commented 1 year ago

Warning about analyzing CDI data

For the capp-dataset, the CDI version was updated from version 1 to version 2 midway through data collection. The CDI2 used in the capp-dataset was modified from the published CDI2 by adding the questions that were removed from CDI1 to the end of CDI2. Although there are some questions overlapping between CDI1 and CDI2, they are in a different order and so the scoring is different for both. Any future analyses need to be extremely mindful of the version each participant took!

ostibolt commented 1 year ago

Warning about analyzing CDI data

For the capp-dataset, the CDI version was updated from version 1 to version 2 midway through data collection. The CDI2 used in the capp-dataset was modified from the published CDI2 by adding the questions that were removed from CDI1 to the end of CDI2. Although there are some questions overlapping between CDI1 and CDI2, they are in a different order and so the scoring is different for both. Any future analyses need to be extremely mindful of the version each participant took!

Update about analyzing CDI data

I changed the format of the COLLECTIVE_dataset so that CDI1 and CDI2 have their own columns. This should prevent any confusion regarding scoring. R script "COLLECTIVE_demographics_PRE_v2-0.R" and "COLLECTIVE_PRE_dataset_v2-0.csv" reflect this change.

ostibolt commented 1 year ago

Date formats pulled in for CAPP dataset

If you open any sheets (whether the origin was .csv or it was originally .sav) in Excel, the date formats will appear incorrect. The correct data will show up in R or other text readers, but only if no modifications are made in excel to the .csv. For example, a participant may indicate that they started a medication "01/08" because the prompt was month/year. In R, "01/08" is retained when you import the .csv. In Excel, that is stored "8-Jan" which is incorrect. If I edit the .csv using Excel in any way (e.g., delete a column out of that sheet using Excel bc it was identifiable data), then that "8-Jan" format is saved and that is how it is read into R (or any other text reader).

This can be resolved by:

Uploading all data in its original identifiable format to a protected spot on the HPC and using the original files to run the Collective script
Using R to delete columns with identifiable data instead of Excel and keeping the Collective script and de-id files on local

NDCLab / post-error-ddm

PE-DDM notebook #7

Flanker files with ENG and SPN names

Breakdown

Next steps

Analyses based on congruency

Question: if DDM is looping through congruent and incongruent, why is it not producing four .csvs (post-error congruent, post-error incongruent, post-correct congruent, post-correct incongruent)?

Question: If we are interested in "only including incongruent trials", then where does that leave us with the way the data is currently being fit?

Analyses based on congruency

Question: if DDM is looping through congruent and incongruent, why is it not producing four .csvs (post-error congruent, post-error incongruent, post-correct congruent, post-correct incongruent)?

Question: If we are interested in "only including incongruent trials", then where does that leave us with the way the data is currently being fit?

Update

A note about collecting past treatment history

Warning about analyzing CDI data

Warning about analyzing CDI data

Update about analyzing CDI data

Date formats pulled in for CAPP dataset