Closed pbilodeau94 closed 7 months ago
Thanks for the bug report! I wasn't able to reproduce this issue with the test REDCap projects we have which include yesno
variables. Could you try to provide a minimal reproducible example? That could look like:
REDCapR::redcap_metadata_read(redcap_uri = [YOUR_URI], token = [YOUR_TOKEN])
REDCapR::redcap_read_oneshot(
redcap_uri = [YOUR_URI],
token = [YOUR_TOKEN],
export_survey_fields = TRUE,
export_data_access_groups = TRUE,
guess_max = .Machine$integer.max
)
It would also be helpful if you could provide the results of running sessionInfo()
after running the code to produce the bug.
Thanks for your quick response! I am not able to share the token and URL since there is PHI in the database. When I run this, it looks like those variables do accurately get pulled, unlike when using read_redcap.
Are you able to share what those results look like after manually removing PHI? Alternatively, if you have the ability to create REDCap projects at your institution, are you able to create an example that demonstrates the bug without PHI?
Since it sounds like downloading the data from REDCap isn't where the bug is originating, it would be helpful to see your sessionInfo()
so I can check if this is specific to particular package versions.
Here’s what it looks like with read_redcap. mri_relapse and gad_relapse are N/A for all record IDs. [A screenshot of a computer Description automatically generated]
Here’s what it looks like when using REDCapR directly. data_mri_relapse and data_gad_relapse are accurately pulled. [A screenshot of a computer Description automatically generated]
Here’s sessionInfo
R version 4.3.3 (2024-02-29) Platform: aarch64-apple-darwin20 (64-bit) Running under: macOS Sonoma 14.3.1
Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
time zone: America/New_York tzcode source: internal
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] Hmisc_5.1-2 redcapAPI_2.8.4 DemographicTable_0.1.6 scales_1.3.0 ggrepel_0.9.5 [6] officer_0.6.5 flextable_0.9.5 broom_1.0.5 kableExtra_1.4.0 skimr_2.1.5 [11] labelled_2.12.0 knitr_1.45 REDCapR_1.1.0 REDCapTidieR_1.0.0 lubridate_1.9.3 [16] forcats_1.0.0 stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 [21] tidyr_1.3.1 tibble_3.2.1 ggplot2_3.5.0 tidyverse_2.0.0
loaded via a namespace (and not attached): [1] gridExtra_2.3 rlang_1.1.3 magrittr_2.0.3 e1071_1.7-14 compiler_4.3.3 [6] getPass_0.2-4 systemfonts_1.0.6 vctrs_0.6.5 httpcode_0.3.0 pkgconfig_2.0.3 [11] crayon_1.5.2 fastmap_1.1.1 backports_1.4.1 ellipsis_0.3.2 utf8_1.2.4 [16] promises_1.2.1 rmarkdown_2.26 tzdb_0.4.0 haven_2.5.4 ragg_1.3.0 [21] bit_4.0.5 xfun_0.42 jsonlite_1.8.8 later_1.3.2 uuid_1.2-0 [26] parallel_4.3.3 cluster_2.1.6 R6_2.5.1 stringi_1.8.3 RColorBrewer_1.1-3 [31] rpart_4.1.23 Rcpp_1.0.12 assertthat_0.2.1 lobstr_1.1.2 base64enc_0.1-3 [36] nnet_7.3-19 httpuv_1.6.14 timechange_0.3.0 tidyselect_1.2.1 rstudioapi_0.15.0 [41] yaml_2.3.8 curl_5.2.1 shiny_1.8.0 withr_3.0.0 askpass_1.2.0 [46] evaluate_0.23 foreign_0.8-86 proxy_0.4-27 zip_2.3.1 xml2_1.3.6 [51] formattable_0.2.1 pillar_1.9.0 checkmate_2.3.1 generics_0.1.3 vroom_1.6.5 [56] hms_1.1.3 munsell_0.5.0 chron_2.3-61 xtable_1.8-4 class_7.3-22 [61] glue_1.7.0 gdtools_0.3.7 tools_4.3.3 gfonts_0.2.0 data.table_1.15.2 [66] grid_4.3.3 colorspace_2.1-0 repr_1.1.6 htmlTable_2.4.2 Formula_1.2-5 [71] cli_3.6.2 labelVector_0.1.2 textshaping_0.3.7 fontBitstreamVera_0.1.1 fansi_1.0.6 [76] viridisLite_0.4.2 svglite_2.1.3 keyring_1.3.2 gtable_0.3.4 digest_0.6.35 [81] fontquiver_0.2.1 crul_1.4.0 htmlwidgets_1.6.4 htmltools_0.5.7 lifecycle_1.0.4 [86] httr_1.4.7 mime_0.12 fontLiberation_0.1.0 openssl_2.1.1 bit64_4.0.5
From: Ezra Porter @.>
Date: Monday, March 18, 2024 at 4:37 PM
To: CHOP-CGTInformatics/REDCapTidieR @.>
Cc: pbilodeau94 @.>, Author @.>
Subject: Re: [CHOP-CGTInformatics/REDCapTidieR] [BUG] yesno variables are pulled as NA
(Issue #181)
Are you able to share what those results look like after manually removing PHI? Alternatively, if you have the ability to create REDCap projects at your institution, are you able to create an example that demonstrates the bug without PHI?
Since it sounds like downloading the data from REDCap isn't where the bug is originating, it would be helpful to see your sessionInfo() so I can check if this is specific to particular package versions.
— Reply to this email directly, view it on GitHubhttps://github.com/CHOP-CGTInformatics/REDCapTidieR/issues/181#issuecomment-2004941801, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BGXP2ZQ575T4AN5RNZY4H63YY5GANAVCNFSM6AAAAABE4FK7W6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBUHE2DCOBQGE. You are receiving this because you authored the thread.Message ID: @.***>
Thanks! I don't think the screenshots made it through. Can you try to resend them? You may need to use the github.com UI instead of email for it to work
read_redcap :
RedcapR:
Ah okay the issue is with those "UNK" values, which I guess your project is getting through the Missing Data Codes customization. I have to think a little more about how we should handle these but agree the current behavior isn't right so we'll implement some fix.
If you have a second to think about it, what would your expectation as a user be for a yesno
field that has UNK
values? Convert the field to a factor with levels Yes
, No
, UNK
? Convert UNK
to NA
and have the field remain TRUE/FALSE
? Something else?
Ah yes that makes sense. Yes, I think converting UNK to NA and keeping it as TRUE/FALSE makes the most sense.
From: Ezra Porter @.>
Date: Monday, March 18, 2024 at 5:54 PM
To: CHOP-CGTInformatics/REDCapTidieR @.>
Cc: pbilodeau94 @.>, Author @.>
Subject: Re: [CHOP-CGTInformatics/REDCapTidieR] [BUG] yesno variables are pulled as NA
(Issue #181)
Ah okay the issue is with those "UNK" values, which I guess your project is getting through the Missing Data Codeshttps://kb.wisc.edu/smph/informatics/page.php?id=108107 customization. I have to think a little more about how we should handle these but agree the current behavior isn't right so we'll implement some fix.
If you have a second to think about it, what would your expectation as a user be for a yesno field that has UNK values? Convert the field to a factor with levels Yes, No, UNK? Convert UNK to NA and have the field remain TRUE/FALSE? Something else?
— Reply to this email directly, view it on GitHubhttps://github.com/CHOP-CGTInformatics/REDCapTidieR/issues/181#issuecomment-2005112835, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BGXP2ZTCAQIFKTYXCBGLQRDYY5PCTAVCNFSM6AAAAABE4FK7W6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBVGEYTEOBTGU. You are receiving this because you authored the thread.Message ID: @.***>
Since Missing Data Codes (MDCs) can be applied to any field type, I think we need to consider what we want to happen for each of the various field types if a MDC is enabled here. The trickiest would be the boolean-type fields (yes/no, true/false, multiple choice??). The two options I see are 1) we no longer handle these fields as boolean but turn them into factors or 2) keep them boolean and recode MDC values as NA.
I guess the default behavior could #2. However since that is also how an empty value is coded that could cause data loss which shouldn't happen silently. So I think we might want to have read_redcap()
throw a warning when this is detected suggesting that the user define the desired behavior. This could be done with an optional arg to read_redcap()
, e.g. boolean_mdc_fields_as_factor
?
Thoughts?
So I think we might want to have read_redcap() throw a warning when this is detected suggesting that the user define the desired behavior. This could be done with an optional arg to read_redcap(), e.g. boolean_mdc_fields_as_factor?
IMO this sounds better than (1), I don't think the default should be to change the data types. Sort of similar to how we went about #177, this isn't exactly a typical setup so if a user wants to change something it's good to do so consciously.
Handling this is going to be non-trivial, to my knowledge there's no easy way to detect that missing data codes exist (unless I'm totally overlooking something!). I made a test database to verify, below is a sample data and metadata output for a yesno and a text field with an UNK
MDC:
Browse[2]> db_data
record_id yesno text form_1_complete
1 1 1 text 2
2 2 0 text 2
3 3 UNK UNK 0
Browse[2]> db_metadata
# A tibble: 3 × 18
field_name form_name section_header field_type field_label select_choices_or_calculat…¹ field_note
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 record_id NA NA text Record ID NA NA
2 yesno form_1 NA yesno YesNo NA NA
3 text form_1 NA text Text NA NA
# ℹ abbreviated name: ¹select_choices_or_calculations
# ℹ 11 more variables: text_validation_type_or_show_slider_number <chr>, text_validation_min <chr>,
# text_validation_max <chr>, identifier <chr>, branching_logic <chr>, required_field <chr>,
# custom_alignment <chr>, question_number <chr>, matrix_group_name <chr>, matrix_ranking <chr>,
# field_annotation <chr>
With our current setup, I suppose the best bet is to check each variable type in the metadata, then check for values that don't coincide with expected values. I'm not sure how we would do this for plain text fields. This may also add more processing time, which supports not having this as a default behavior.
There’s no way to detect project-level MDCs using the API. The codebook in the metadata has information on which MDCs were enabled, but not at the variable level. Not sure exactly how your API call is set up (e.g whether there’s a missing data function), but the easiest way may be to ask the user to specify which MDCs were used in the specific project as a custom function.
From: Rich Hanna @.>
Date: Tuesday, March 19, 2024 at 9:10 AM
To: CHOP-CGTInformatics/REDCapTidieR @.>
Cc: pbilodeau94 @.>, Author @.>
Subject: Re: [CHOP-CGTInformatics/REDCapTidieR] [BUG] yesno
fields are converted to NA
when the field contains missing data codes (Issue #181)
So I think we might want to have read_redcap() throw a warning when this is detected suggesting that the user define the desired behavior. This could be done with an optional arg to read_redcap(), e.g. boolean_mdc_fields_as_factor?
IMO this sounds better than (1), I don't think the default should be to change the data types. Sort of similar to how we went about #177https://github.com/CHOP-CGTInformatics/REDCapTidieR/pull/177, this isn't exactly a typical setup so if a user wants to change something it's good to do so consciously.
Handling this is going to be non-trivial, to my knowledge there's no easy way to detect that missing data codes exist (unless I'm totally overlooking something!). I made a test database to verify, below is a sample data and metadata output for a yesno and a text field with an UNK MDC:
Browse[2]> db_data
record_id yesno text form_1_complete
1 1 1 text 2
2 2 0 text 2
3 3 UNK UNK 0
Browse[2]> db_metadata
field_name form_name section_header field_type field_label select_choices_or_calculat…¹ field_note
It looks like the missing data codes are returned in the Export Project Info API method so we theoretically could get them. REDCapR::redcap_project_info_read()
returns them but it's locked up in the dev version. I would very much like to avoid getting them but I'm not sure we have a good choice.
There's really only 3 cases we currently have with processing fields of different types. For the first 2 it's pretty trivial to remove missing data codes without knowing what they are. For the third it's basically impossible. That leaves us in a bind if we want a solution that:
NA
everywhere or nowhereI think the least of all evils is to convert to NA
where we can without knowing the codes for now (cases 1 and 2) and finish the job once REDCapR 1.2.0
is on CRAN (case 3). If we hate that transitional plan we query the API directly as a holdover.
Case 1: yesno
, truefalse
, checkbox
Current behavior: Convert to logical
with as.logical()
Possible new behavior: Check for non-1/0/NA
and warn if found. Convert those to NA
then convert to logical
.
Case 2: dropdown
, radio
Current behavior: Convert to factor
using levels read from metadata. (This implicitly converts missing codes to NA
since they aren't in the metadata.)
Possible new behavior: Check for data values not in the levels read from metadata and warn if found. Convert to factor, implicitly converting missing codes to NA
.
Case 3: Everything else
Current behavior: Do nothing
Possible new behavior: For now, do nothing. When REDCapR 1.2.0
is on CRAN, read the MDCs from project info and convert to NA
with a warning.
We can add an option to silence warnings so users can shut them off when they don't have control of the REDCap project.
After further discussion some additional behavioral things to note for MDCs:
checkbox
fields automatically get an additional column added to the output, i.e. checkbox___unk
for an UNK
MDCdropdown
/ radio
would be difficult, and for projects where an MDC value hasn't been used yet is impossible without using redcap_project_info_read()
UNK
could be a project MDC, and a user could still supply "UNK" as a valueConsidering that I suspect only a small subset of REDCap databases use MDCs, possibly breaking changes to the supertibble data model, and likely performance hit with robustly supporting MDCs, I think what Ezra outlines above is a reasonable solution for now (i.e. implement cases 1 and 2). We can consider implementing case (3) after someone demonstrates the need for this by opening a feature request. We can write a warning message that states that we are not currently supporting MDCs but if this is needed they should file an issue on our GitHub page.
I noticed when working on this that we currently don't convert fields to logical
when raw_or_label = "raw"
. Do we like that behavior? I don't have strong opinions either way.
I noticed when working on this that we currently don't convert fields to
logical
whenraw_or_label = "raw"
. Do we like that behavior? I don't have strong opinions either way.
Like how yesno
fields are 1/0 for "raw" but TRUE/FALSE for "label"? I also don't feel strongly, 1/0 is what REDCap itself uses for raw values for these fields so it seems ok to keep as is.
Like how
yesno
fields are 1/0 for "raw" but TRUE/FALSE for "label"? I also don't feel strongly, 1/0 is what REDCap itself uses for raw values for these fields so it seems ok to keep as is.
Yeah okay I agree and it actually will keep our code simpler to keep as is. PR coming soon but I'm leaning towards also keeping the missing data codes when raw_or_label = "raw"
. That has the advantage of keeping our code simpler and giving users a way to get the MDCs if they really want.
@pbilodeau94 when you have a chance let us know if #182 solved your issue. You can try it out by installing the dev version
Yes, looks like it's all set now! Thank you!!
Current Behavior
When using read_redcap, all yes/no variables are all listed as N/A. Dummy factor variables for multiple checkboxes get pulled correctly as TRUE/FALSE.
How to Reproduce the Bug:
nmosd <- read_redcap(url, token) %>% make_labelled(format_labels = fmt_strip_trailing_colon) %>% bind_tibbles()