@thomasnwilson suggested that we develop a function/package that can lint REDCap dictionaries and return a markdown/html report. These are the initial rules we thought of while waiting in airport.
REDCapLintR: Tool for REDCap Dictionary Good Practices
Working title: "REDCrapR" or "CrapR" or "Moving from REDCrap to REDCap"
[ ] Rule: no variable should end in "_v\d" (eg, _v2 or _v3)
Opinion: variables in a sequence should have a meaningful name that clearly communicates its position in the sequence.
Examples of bad behavior: age, age_v2, and age_v2_v2
Suggested fix: Rename variables to age_baseline, age_discharge, age_followup
[ ] Smell: at least 10% of text variables should have validation
[ ] Smell: at least 20% of variables should be non-text, like dropdowns or sliders
[ ] All piped values should originate from variables, events, or smart variables that are currently in the dictionary.
Check that all these still exist among a combined list of variables, events, & smart variables.
regex: \[[a-z][a-z0-9_-]*\]
[ ] All embedded variables should originate from variables, events, or smart variables that are currently in the dictionary.
Check that all these still exist among a combined list of variables, events, & smart variables.
regex: \{[a-z][a-z0-9_-]*\}
[ ] Rule: all date variables should have the same format within the project. Don't mix & match dmy and mdy.
[ ] All forms/instruments should be mapped to at least one event
exception: the instrument ends with "*_retired"
[ ] Rule: any variable with something like "phone" in the variable name, field label or field note should have a phone validation. Tokens include
phone
mobile
cell
contact number
[ ] Rule: any variable with something like "number" in the variable name, field label or field note should have a integer or numeric validation. Tokens include
number
age
count
[ ] Rule: any variable with something like "zip code" in the variable name, field label or field note should have a zip code validation. Tokens include
zip
zip_code
zipcode
[ ] Rule: any variable with something like T/F, Y/N in the variable name, field label or field note should have a "1" for true/yes/on and "0" for false/no/off. Tokens include (case insensitive):
t/f
true/false
y/n
yes/no
on/off
Rule: male & female consistently coded as 1/0, 1/2, or 8507/8532 (for OMOP)
m/f
male/female
[ ] ?? can we expand this to tri-state variables like yes/no/maybe or yes/no/null ??
[ ] Rule: multiple choice responses options are coded as integers (instead of letters)
@thomasnwilson suggested that we develop a function/package that can lint REDCap dictionaries and return a markdown/html report. These are the initial rules we thought of while waiting in airport.
REDCapLintR: Tool for REDCap Dictionary Good Practices
Working title: "REDCrapR" or "CrapR" or "Moving from REDCrap to REDCap"
[ ] Rule: no variable should end in "_v\d" (eg, _v2 or _v3) Opinion: variables in a sequence should have a meaningful name that clearly communicates its position in the sequence. Examples of bad behavior:
age
,age_v2
, andage_v2_v2
Suggested fix: Rename variables toage_baseline
,age_discharge
,age_followup
[ ] Smell: at least 10% of text variables should have validation
[ ] Smell: at least 20% of variables should be non-text, like dropdowns or sliders
[ ] All piped values should originate from variables, events, or smart variables that are currently in the dictionary. Check that all these still exist among a combined list of variables, events, & smart variables. regex:
\[[a-z][a-z0-9_-]*\]
[ ] All embedded variables should originate from variables, events, or smart variables that are currently in the dictionary. Check that all these still exist among a combined list of variables, events, & smart variables. regex:
\{[a-z][a-z0-9_-]*\}
[ ] Rule: all date variables should have the same format within the project. Don't mix & match dmy and mdy.
[ ] All forms/instruments should be mapped to at least one event
[ ] Rule: any variable with something like "phone" in the variable name, field label or field note should have a phone validation. Tokens include
[ ] Rule: any variable with something like "number" in the variable name, field label or field note should have a integer or numeric validation. Tokens include
[ ] Rule: any variable with something like "zip code" in the variable name, field label or field note should have a zip code validation. Tokens include
[ ] Rule: any variable with something like T/F, Y/N in the variable name, field label or field note should have a "1" for true/yes/on and "0" for false/no/off. Tokens include (case insensitive):
Rule: male & female consistently coded as 1/0, 1/2, or 8507/8532 (for OMOP)
[ ] ?? can we expand this to tri-state variables like yes/no/maybe or yes/no/null ??
[ ] Rule: multiple choice responses options are coded as integers (instead of letters)