CHOP-CGTInformatics / REDCapTidieR

Makes it easy to read REDCap Projects into R
https://chop-cgtinformatics.github.io/REDCapTidieR/
Other
33 stars 8 forks source link

[BUG] Project exported by REDCapR::redcap_read but REDCapTidieR::read_redcap fails #212

Open clelandcm opened 3 days ago

clelandcm commented 3 days ago

Expected Behavior

Creation of a supertibble

Current Behavior

read_redcap fails with an error message while REDCapR::redcap_read succeeds on the same project

Failure log

options(redcaptidier.allow.mixed.structure = TRUE)

N4_stbl <- read_redcap(redcap_uri=uri, raw_or_label = "haven", token=token) Error in read_redcap(): ! The REDCap API did not return any data. This can happen when there are no data entered or when the access isn't configured to allow data export through the API. Run rlang::last_trace() to see where the error occurred. N4_DT <- REDCapR::redcap_read(redcap_uri=uri, raw_or_label = "raw", token=token, verbose = FALSE)$data

dim(N4_DT) [1] 35368 3104

rlang::last_trace(drop = FALSE) <error/redcap_unpopulated> Error in read_redcap(): ! The REDCap API did not return any data. This can happen when there are no data entered or when the access isn't configured to allow data export through the API.

Backtrace: ▆

  1. └─REDCapTidieR::read_redcap(redcap_uri = uri, raw_or_label = "haven", token = token)
  2. └─REDCapTidieR:::check_redcap_populated(db_data)
  3. └─cli::cli_abort(...)
  4. └─rlang::abort(...)
skadauke commented 3 days ago

Try again without the $data.

clelandcm commented 3 days ago

That part succeeds, but here is a modification where I have REDCapR::redcap_read() return the full list:

N4_list <- REDCapR::redcap_read(redcap_uri=uri,  
                        raw_or_label = "label", token=token, verbose = FALSE)

N4_list[["data"]] |> dim() [1] 35368 3104

When trying to read the same project with REDCapTidieR::read_redcap(), I get the error:

N4_stbl <- read_redcap(redcap_uri=uri,  
                        raw_or_label = "label", token=token,
                        allow_mixed_structure = TRUE)

Error in read_redcap(): ! The REDCap API did not return any data. This can happen when there are no data entered or when the access isn't configured to allow data export through the API. Run rlang::last_trace() to see where the error occurred.

rlang::last_trace(drop = FALSE) <error/redcap_unpopulated> Error in read_redcap(): ! The REDCap API did not return any data. This can happen when there are no data entered or when the access isn't configured to allow data export through the API.

Backtrace: ▆

  1. └─REDCapTidieR::read_redcap(redcap_uri = uri, raw_or_label = "label", token = token, allow_mixed_structure = TRUE)
  2. └─REDCapTidieR:::check_redcap_populated(db_data)
  3. └─cli::cli_abort(...)
  4. └─rlang::abort(...)
rsh52 commented 3 days ago

Hi @clelandcm thank you for bringing this to our attention. Since you're using a mixed structure database it can be tricky to diagnose, but based on the alert you're getting there's a few things we can start with.

First, can you confirm your data is exportable using REDCapR::redcap_read_oneshot() instead of REDCapR::redcap_read()? read_redcap() wraps this function and passes the output to the check function that returns this error message: https://github.com/CHOP-CGTInformatics/REDCapTidieR/blob/74f986ddad2a9936b1aac950b91f022f7233a784/R/read_redcap.R#L179-L193

Second, can you confirm that you have export access to all of the instruments in your project? This can be found in the "User Rights" section of the REDCap UI. While I don't think this is the issue, if there are any export mismatches it can cause possible mismatches with the metadata. It looks something like this:

image

clelandcm commented 2 days ago

I have View & Edits Rights as well as Data Export Rights for the Full Data Set.

REDCapR::redcap_read_oneshot() returns neither an error nor any data.

N4_oneshot <- REDCapR::redcap_read_oneshot(redcap_uri=uri,
                                            raw_or_label = "label", token=token)

0 records and 0 columns were read from REDCap in 4.3 seconds. The http status code was 200. N4_oneshot[["data"]] |> dim() [1] 0 0 N4_oneshot[["success"]] [1] TRUE

Suspecting the server does not like a request for all records and fields at once, I tried limiting the request to a couple of forms. This succeeds, returning data but also a warning:

N4_2forms_oneshot <- REDCapR::redcap_read_oneshot(redcap_uri=uri, 
                                                   forms = c("screener_1","blood_draw_form"),
                                            raw_or_label = "label", token=token)

35,244 records and 96 columns were read from REDCap in 0.8 seconds. The http status code was 200.
Warning message: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat) N4_2forms_oneshot[["data"]] |> dim() [1] 35244 96 N4_2forms_oneshot[["success"]] [1] TRUE

Limiting the request to just two forms, REDCapTidieR::read_redcap() fails with a different error:

N4_2forms_stbl <- read_redcap(redcap_uri=uri,  
                        raw_or_label = "label", token=token,
                        forms = c("screener_1","blood_draw_form"),
                        allow_mixed_structure = TRUE)

Error in mutate(): ℹ In argument: field_label = strip_html_field_embedding(.data$field_label_updated). Caused by error in .data$field_label_updated: ! Column field_label_updated not found in .data. Run rlang::last_trace() to see where the error occurred.

rlang::last_trace(drop=FALSE) <error/dplyr:::mutate_error> Error in mutate(): ℹ In argument: field_label = strip_html_field_embedding(.data$field_label_updated). Caused by error in .data$field_label_updated: ! Column field_label_updated not found in .data.

Backtrace: ▆

  1. ├─REDCapTidieR::read_redcap(...)
  2. │ └─REDCapTidieR:::get_fields_to_drop(db_metadata, id_form)
  3. │ └─... %>% pull(.data$field_name_updated)
  4. ├─dplyr::pull(., .data$field_name_updated)
  5. ├─REDCapTidieR:::update_field_names(.)
  6. │ └─... %>% select(-"field_label_updated")
  7. ├─dplyr::select(., -"field_label_updated")
  8. ├─dplyr::mutate(., field_label = strip_html_field_embedding(.data$field_label_updated))
  9. ├─dplyr:::mutate.data.frame(., field_label = strip_html_field_embedding(.data$field_label_updated))
    1. │ └─dplyr:::mutate_cols(.data, dplyr_quosures(...), by)
    2. │ ├─base::withCallingHandlers(...)
    3. │ └─dplyr:::mutate_col(dots[[i]], data, mask, new_columns)
    4. │ └─mask$eval_all_mutate(quo)
    5. │ └─dplyr (local) eval()
    6. ├─REDCapTidieR:::strip_html_field_embedding(.data$field_label_updated)
    7. │ └─... %>% str_squish()
    8. ├─stringr::str_squish(.)
    9. │ ├─stringi::stri_trim_both(str_replace_all(string, "\s+", " "))
    10. │ └─stringr::str_replace_all(string, "\s+", " ")
    11. │ └─stringr:::check_lengths(string, pattern, replacement)
    12. │ └─vctrs::vec_size_common(...)
    13. ├─stringr::str_trim(.)
    14. │ └─stringi::stri_trim_both(string)
    15. ├─stringr::str_replace_all(., "<.+?\>", "")
    16. │ └─stringr:::check_lengths(string, pattern, replacement)
    17. │ └─vctrs::vec_size_common(...)
    18. ├─stringr::str_replace_all(., "\{.+?\}", "")
    19. │ └─stringr:::check_lengths(string, pattern, replacement)
    20. │ └─vctrs::vec_size_common(...)
    21. ├─field_label_updated
    22. ├─rlang:::$.rlang_data_pronoun(.data, field_label_updated)
    23. │ └─rlang:::data_pronoun_get(...)
    24. └─rlang:::abort_data_pronoun(x, call = y)
    25. └─rlang::abort(msg, "rlang_error_data_pronoun_not_found", call = call)
rsh52 commented 2 days ago

@clelandcm Interesting, thank you for sending this over. It seems like two problems are occurring here.

The first is that redcap_read_oneshot() is failing on a complex/large database. We can certainly look into pivoting the wrapper over to redcap_read() and expanding some of the params/defaults we offer.

The second is harder to diagnose. We'd like to try two things:

  1. Can you run REDCapTidieR::read_redcap() using the limited/specific forms, but also include the first form in your database?
  2. Would you be open to running REDCapR::redcap_metadata_read() and emailing us the output? It is possible that something is happening when cross-referencing database variable names against metadata field names and labels. If open to it, please send to hannar1@chop.edu and porterej@chop.edu.
clelandcm commented 1 day ago

It may be important that Auto-numbering for records is not enabled for the project, and the first form is a unique identifier which also appears on all other forms.

REDCapTidieR::read_redcap() does not recognize the first form:

N4_3forms_stbl <- read_redcap(redcap_uri=uri,  
                       raw_or_label = "label", token=token,
                       forms = c("unique_id","screener_1","blood_draw_form"),
                       allow_mixed_structure = TRUE)

Error in read_redcap(): ✖ Instrument unique_id does not exist in REDCap project Run rlang::last_trace() to see where the error occurred.

REDCapR::redcap_read_oneshot() does recognize the first form:

N4_3forms_oneshot <- REDCapR::redcap_read_oneshot(redcap_uri=uri,
                                                  raw_or_label = "label", token=token,
                                                  forms = c("unique_id","screener_1","blood_draw_form"))

35,248 records and 101 columns were read from REDCap in 3.7 seconds. The http status code was 200.
Warning message: One or more parsing issues, call problems() on your data frame for details, e.g.: dat <- vroom(...) problems(dat)

I sent the list returned by REDCapR::redcap_metadata_read() as N4-Meta.RData via email.

rsh52 commented 1 day ago

@clelandcm Thanks a bunch for sending this over, I think I'm understanding the issue more now. Auto-numbering doesn't impact anything so all good there.

The issue occurs here, and is related to the unique_id instrument not having any fields other than the record ID field:

https://github.com/CHOP-CGTInformatics/REDCapTidieR/blob/74f986ddad2a9936b1aac950b91f022f7233a784/R/read_redcap.R#L120-L122

This is likely something we need to rework, but the gist is that we chose to remove the form_name exclusively for the first record of the metadata (which is always the record ID field) so we could have the record ID field persist throughout each tidy tibble. We incorrectly assumed all projects would have additional information and not just leave this field by itself.

Funny enough I assume if you put any field at all into the unique_id field this would end up working again, but it is a definite bug we need to resolve so thank you for bringing it to our attention.