CHOP-CGTInformatics / REDCapTidieR

Makes it easy to read REDCap Projects into R
https://chop-cgtinformatics.github.io/REDCapTidieR/
Other
33 stars 8 forks source link

Handle Missing Data Codes #182

Closed ezraporter closed 8 months ago

ezraporter commented 8 months ago

Description

This PR updates our handling of redcap projects using missing data codes. The new behavior is:

Benchmarks initially looked like we had increased run time for two of our redcaps but that turned out to be a false positive after increasing the number of microbenchmark iterations.

Proposed Changes

Screenshots Logical field warning: Screenshot 2024-03-21 at 5 13 22 PM

Categorical field warning: Screenshot 2024-03-21 at 5 13 35 PM

Issue Addressed

Relates to #181

PR Checklist

Before submitting this PR, please check and verify below that the submission meets the below criteria:

Code Review

This section to be used by the reviewer and developers during Code Review after PR submission

Code Review Checklist

rsh52 commented 8 months ago

A few things here, in general I think the actual code and handling is sound and I'll separate out my thoughts below.

Warning Redundancy

Would it be possible to consolidate the warnings so that one warning encompasses all fields? Or at least each type (the logical versus extra fields checks)? Right now the setup makes a separate warning for each field, which would make for a lot of warnings in most cases where this comes up. I added a second yesno field with an UNK MDC to show below, you can remove it if you like:

read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"),
+                   token = Sys.getenv("REDCAPTIDIER_MDC_API"), raw_or_label = "label")$redcap_data[[1]]
# A tibble: 3 × 9                                                                                                                                                                           
  record_id yesno yesno2 text  checkbox___1 checkbox___2 checkbox___3 dropdown form_status_complete
      <dbl> <lgl> <lgl>  <chr> <lgl>        <lgl>        <lgl>        <fct>    <fct>               
1         1 TRUE  NA     text  TRUE         TRUE         FALSE        C        Complete            
2         2 NA    NA     UNK   FALSE        FALSE        FALSE        NA       Complete            
3         3 NA    NA     NA    FALSE        FALSE        FALSE        NA       Incomplete          
Warning messages:
1: In read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_MDC_API"),  :
  ! `yesno` is type 'yesno' but contains non-logical values: UNK
ℹ These were converted to `NA` resulting in possible data loss
ℹ Does your REDCap project utilize missing data codes?
ℹ Silence this warning with `options(redcaptidier.allow.mdc = TRUE)` or set `raw_or_label = 'raw'` to access missing data codes
2: In read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_MDC_API"),  :
  ! `yesno2` is type 'yesno' but contains non-logical values: UNK
ℹ These were converted to `NA` resulting in possible data loss
ℹ Does your REDCap project utilize missing data codes?
ℹ Silence this warning with `options(redcaptidier.allow.mdc = TRUE)` or set `raw_or_label = 'raw'` to access missing data codes
3: In read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"), token = Sys.getenv("REDCAPTIDIER_MDC_API"),  :
  ! `dropdown` contains values with no labels: UNK
ℹ These were converted to `NA` resulting in possible data loss
ℹ Does your REDCap project utilize missing data codes?
ℹ Silence this warning with `options(redcaptidier.allow.mdc = TRUE)` or set `raw_or_label = 'raw'` to access missing data codes

I think aesthetically something similar to the warning we had for mixed-structure data would be what to aim for:

> read_redcap(redcap_uri = Sys.getenv("REDCAP_URI"),
+             token = Sys.getenv("REDCAPTIDIER_MIXED_STRUCTURE_API"))
Error in `clean_redcap_long()` at REDCapTidieR/R/read_redcap.R:278:5:                                                                                                                       
✖ Instruments detected that have both repeating and nonrepeating instances defined in the project: mixed_structure_1 and
  mixed_structure_form_complete
ℹ Set `allow_mixed_structure` to `TRUE` to override. See Mixed Structure Instruments for more information.
Run `rlang::last_trace()` to see where the error occurred.

Raw vs Label Discrepancy

I might be wrong here, but we may have the raw/label order backwards? See below for the output from REDCapR:

> redcap_read_oneshot(redcap_uri = Sys.getenv("REDCAP_URI"),
+                              token = Sys.getenv("REDCAPTIDIER_MDC_API"), raw_or_label = "label")$data
3 records and 10 columns were read from REDCap in 1.7 seconds.  The http status code was 200.                                                                                               
  record_id   yesno  yesno2    text checkbox___1 checkbox___2 checkbox___3 checkbox___unk dropdown form_1_complete
1         1     Yes    <NA>    text      Checked      Checked    Unchecked      Unchecked        C        Complete
2         2 Unknown Unknown Unknown    Unchecked    Unchecked    Unchecked        Checked  Unknown        Complete
3         3    <NA>    <NA>    <NA>    Unchecked    Unchecked    Unchecked      Unchecked     <NA>      Incomplete

Currently the MDC is set as: image

Documentation Support

Would you also mind adding something to this PR that mentions this in one of the vignettes/articles? At the moment the only way a user would know about the options() is to encounter them in the wild from the warning since there's nothing in our supporting documentation. Something short in the "Get Started" at the end should be fine. This will also be a good place to note what isn't supported (text fields, etc.).

ezraporter commented 8 months ago

@rsh52 give this another look once the CI passes. I consolidated the warnings and updated the vignette. As discussed earlier, there isn't a way of getting "Unknown" instead of UNK without reading the project metadata so I left that.