NDCLab / lab-devOps

NDCLab mgmt and operations
GNU Affero General Public License v3.0
0 stars 0 forks source link

Survey Standardization #45

Closed georgebuzzell closed 3 years ago

georgebuzzell commented 3 years ago

Variable naming conventions plan (and scoring plan):

When an instrument is created, each variable for that instrument is named using a semantically meaningful shorthand name, as the stem, followed by an underscore and "item X", with X defining the item number from the questionnaire. Additionally, the variable name is further appended with "ses-X", "run-X" and "event-X" to denote further details of exactly when the questionnaire was collected and to allow for distinguishing it between repeated use of the same questionnaire across different sessions (time points) of a longitudinal study, defined by "ses", across different logical groupings of data collection within a given time point, defined by "run", and between repeated use of the questionnaire following particular experimental manipulations within a given run, defined by "event"

For example, if the SCAARED questionnaire is to be used for a longitudinal study involving data collection at 3 different ages (ses), and at each age, data is collected across two separate days (runs), and further, on the second day (run), the SCAARED is collected before and after some experimental manipulation (event), the following would be used for the first item 1 of the questionnaire (the same would be used for all other items, iterating "item-X"):

SCAARED_item-1_ses-1_run-1_event-1 SCAARED_item-1_ses-1_run-1_event-1 SCAARED_item-1_ses-1_run-1_event-1 SCAARED_item-1_ses-1_run-2_event-1 SCAARED_item-1_ses-1_run-2_event-1 SCAARED_item-1_ses-1_run-2_event-1 SCAARED_item-1_ses-1_run-2_event-2 SCAARED_item-1_ses-1_run-2_event-2 SCAARED_item-1_ses-1_run-2_event-2

SCAARED_item-1_ses-2_run-1_event-1 SCAARED_item-1_ses-2_run-1_event-1 SCAARED_item-1_ses-2_run-1_event-1 SCAARED_item-1_ses-2_run-2_event-1 SCAARED_item-1_ses-2_run-2_event-1 SCAARED_item-1_ses-2_run-2_event-1 SCAARED_item-1_ses-2_run-2_event-2 SCAARED_item-1_ses-2_run-2_event-2 SCAARED_item-1_ses-2_run-2_event-2

When a new questionnaire is entered to redcap (i.e. the first time the lab uses the SCARRED) the questionnaire should be entered as a template. That is, the variable names should be the questionnaire stem name, with item-x defined, and the placeholders "ses-x" "run-x" "event-x" appended as well. Additionally, readme should be created for the questionnaire, as well as an automated scoring script appended to the automated scoring tool.

Whenever the SCAARED is used for a new study, it should be copied over, and then modified based on the appropriate ses, run, and event for the study. ses, run, and event must always appear in the variable name, even for a "one-off" study. In this case, all values would default to 1.

For the automated scoring, the code will look for the stem name of each variable to determine what questionnaire an item comes from. Then, it will read the appended values for item, ses, run, and event. The code will then seearch the csv for all other variables with the same stem (SCAARED) and same values for ses, run, event (but differing in item) to pull all items for that questionnaire, for that specific ses/run/event combination. The code will then score the questionnaire and output factor score(s) for that questionnaire, for that specific ses/run/event combination, with the name for the variable being "SCAARED_XXXX_ses-x_run-x_event-x. Here, XXXX is replaced with the factor score name (there may be multiple exported) and the values for ses, run, and event are also written. The code then procedes to search for additional ses/run/event combinations for SCAARED and scores these in the same way. To search for these exhaustively, the code must first seach for another instance of SCAARED with a matching item (1), ses, run, but now iterating event to be the next highest number (e.g, 2). If this is found, then it is scored, then event is iterated to 3, etc, until no more instances are found for SCARRED_item-1 with the same ses and run #. Then, the process is repeated with iterating run (and, in turn, iterating run), the same is done for ses (iterating run and event within). Item does not need to be iterated to search for additional instances and should be held at "1" when searching for repeats of the questionnaire.

jessb0t commented 3 years ago

For questions with sublevels, such as the Everyday Discrimination Scale, use:

eds_item-6-subitem-1_ses-1_run-1_event-1

jessb0t commented 3 years ago

For surveys without scoring, such as demographics, use: demographics_pronouns_ses-1_run-1_event-1

jessb0t commented 3 years ago

Shortening to avoid length limitation problems in some software packages: overall survey: instrument_s1_r1_e1 (10 char max limit on instrument name) field: instrument_i1_s1_r1_e1 ---where REDCap will not automate subnumbering: instrument_i1-sub1_s1_r1_e1

jessb0t commented 3 years ago

This has been integrated to the wiki and all live studies have been updated. The instrument repo is in the process of being updated to include all necessary components of the REDCap exports and the associated scoring script.