NDCLab / lab-devOps

NDCLab mgmt and operations
GNU Affero General Public License v3.0
0 stars 0 forks source link

Data Monitoring | THRIVE #233

Open jessb0t opened 1 year ago

jessb0t commented 1 year ago

upgrade of data monitoring process to handle:

jessb0t commented 1 year ago

Initial Notes

jessb0t commented 1 year ago

Notes from 5/10 meeting

jessb0t commented 1 year ago

Notes from 5/15 meeting

After meeting, ja/dh discussed that there should be no duplicate column names across the REDCap files. To be safe, building a check that throws an error if it sees a duplicate.

lillylaplace commented 1 year ago

Notes from 6/1 meeting

lillylaplace commented 1 year ago

Note from JA on Slack on 6/5: We need to manipulate variable names when we set up hallMonitor for the first time. We will need to map consent_es_complete, which is what is in REDCap, to consentescomplete. In this way, David’s script can use the same logic to collapse across English and Spanish versions of REDCap consent data (based on whether the last two letters before the are “es”) in the same way he is doing for the survey data.

davhunt commented 1 year ago

For the "arrow-alert-vN" psychopy tasks, including the version number vN in the central tracker datadict when matching up with psychopy task file names and updating central tracker.

davhunt commented 1 year ago

For surveys that are given to both parent and child, specifying these surveys during the setup.sh call so that hallMonitor knows to map the child's survey to "[survey-name]_s1_r1_e1" and the parent's survey to something like "[survey-name]_parental_self_report_s1_r1_e1" when updating central tracker.

davhunt commented 1 year ago

When a survey has a version number (like masies_b_s1_r1_e1) we can assume it will always be a single letter? Or will it ever be a letter + number or something?

jessb0t commented 1 year ago

When a survey has a version number (like masies_b_s1_r1_e1) we can assume it will always be a single letter? Or will it ever be a letter + number or something?

@davhunt It will also be a set of letters (single letter until we hit "z", then "aa" and so on, never numbers or anything else).

davhunt commented 1 year ago

For backward- (and forward-) compatibility monitoring should be able to run on projects that

  1. may or may not have both child and parent data, and
  2. may or may not have both English and Spanish survey responses

These should either be specified by a flag during setup, or the monitoring scripts should detect whether these are true and behave accordingly (if for example it only sees IDs starting with 301, 308 and 309)

jessb0t commented 1 year ago

:point_up: comment immediately above discussed at meeting of 6/9:

  1. researcher will specify with a flag at data monitoring setup
  2. no need for the researcher to communicate; script will automatically look for any survey/surveyES and collapse accordingly
lillylaplace commented 1 year ago

Notes from DH/LL script testing on 6/19:

  1. parent and consent are now being logged correctly in central tracker
  2. parent identity (primary or secondary) and language (English or Spanish) are now being logged correctly in central tracker
  3. 5 self-reports both parents and kids answer (bfne, fpe, masi, pints, via) are being handled by a flag in the setup script which creates new columns for parent self-reports at end of tracker which have the string "parent" appended into their names
    • problem: tracker is creating "parent"-appended columns for bfnep and fpep, which should only happen for bfne and fpe. DH rewriting script to look for "bfne" and "fpe" rather than just "bfne" and "fpe" so that parent reports don't get picked up
    • problem: tracker is maintaining columns for parent self-reports named "bfne_b_s1_r1_e1.1" and so on. DH rewriting script so that those columns are not created but instead are replaced with the "parent"-appended columns
  4. scaared_b_s1_r1_e1 is empty in tracker even though there is data for that survey within raw
lillylaplace commented 1 year ago

Notes from DevOps meeting on 6/20:

  1. DH fixed script so that parent self-report columns with "e1.1" names are no longer being created in central tracker
  2. DH fixed script so that bfnep and fpep are no longer being treated as parent self-reports and being appended with "parent" and added in as columns at end of tracker
  3. scaared_b_s1_r1_e1 column was empty in tracker because scaared was missing the "_b" in the central tracker data dictionary (occurred because the "_b" was at one point missing in the instrument's name in REDCap, but this was fixed)
    • NOTE FOR FUTURE TROUBLESHOOTING: when column shows up empty in tracker but there is data present for that survey in raw, first place to check is central tracker data dictionary
  4. new method for handling 5 self-reports both parents and kids answer -- rather than using a flag in setup script, the names of the parent self-reports for those 5 instruments will be appended with "parent" within the central tracker data dictionary
    • ideally, this will be handled in the scripts in a way that is modular and can apply to future studies that won't specify "parent" but some other string of text. DH to set up this way if possible, or setup for THRIVE now and make modular later if too time-intensive

Next steps: test scripts rigorously (try to break them), then test with real data. After that, work with Emily Martin on setting up data monitoring for oops-faces

lillylaplace commented 1 year ago

Notes from Slack conversation on 6/14:

  1. audio files will be named "audacity" --> scripts and data dictionary should be updated accordingly
  2. video files will be named "zoom" --> scripts and data dictionary should be updated accordingly
  3. audacity and zoom files will be named according to the following convention: [thriveID]_all-["single" or "double"]_zoom-_s1_r1_e1 and [thriveID]_all-["single" or "double"]_audacity-_s1_r1_e1; the "single" flag will be for files with one in-person child (and one remote child), the "double" flag will be for two in-person participants --> scripts and data dictionary should be updated accordingly
  4. for participants who do not have data for a given file type due to experimental error (as opposed to missing data due to failure to upload), text files will be inserted in those file folders that say "no-digi" or "no-eeg" etc. --> DH confirms this will not cause problems for data monitoring scripts, which don't mind extra files as long as they are correctly nested within the established folder structure
lillylaplace commented 1 year ago

Up-to-date checklist of THRIVE data monitoring projects:

lillylaplace commented 12 months ago

Notes from meeting 7/10:

lillylaplace commented 11 months ago

Notes from meeting 7/17:

lillylaplace commented 11 months ago

To-do list from meeting 7/19:

lillylaplace commented 11 months ago

Updates from meeting 7/27:

Other to-do items:

lillylaplace commented 11 months ago

Summary from meeting 8/3:

David's to-do list:

My to-do list:

Other points from discussion:

lillylaplace commented 10 months ago

Notes from 8/15 meeting: