OuhscBbmc / REDCapR

R utilities for interacting with REDCap
https://ouhscbbmc.github.io/REDCapR
Other
112 stars 46 forks source link

Unexpected Behavior: `redcap_read()` fails for older versions of REDCap (~v7.3.5) #465

Closed the-mad-statter closed 1 year ago

the-mad-statter commented 1 year ago

I am not sure if you want to support this given that it effects pretty old versions of REDCap, and I am not sure when Vanderbilt fixed the root cause, but redcap_read() presently fails when trying to read records on older versions of REDCap circa v7.3.5.

Specifically, I am trying to read data from a v7.3.5 instance of REDCap, and redcap_read() fails with the following:

Warning: The following named parsers don't match the column names: has_repeating_instruments_or_events, missing_data_codes, external_modules, bypass_branching_erase_field_prompt1 rows were read from REDCap in 0.3 seconds.  The http status code was 200.
Warning: Unknown or uninitialised column: `has_repeating_instruments_or_events`.Error in if (d_proj$has_repeating_instruments_or_events[1]) { : 
 argument is of length zero

I have traced the issue to a pairing of what the REDCap API returns and this commit on 2022-10-08. That is, redcap_read() broke for v7.3.5 with this commit.

The underlying issue is that redcap_metadata_internal() expects a long list of columns to be returned, but the API does not return all of them for older versions of the API.

Here are the expected columns:

col_types <- readr::cols(
  project_id                              = readr::col_integer(),
  project_title                           = readr::col_character(),
  creation_time                           = readr::col_datetime(format = ""),
  production_time                         = readr::col_datetime(format = ""),
  in_production                           = readr::col_logical(),
  project_language                        = readr::col_character(),
  purpose                                 = readr::col_integer(),
  purpose_other                           = readr::col_character(),
  project_notes                           = readr::col_character(),
  custom_record_label                     = readr::col_character(),
  secondary_unique_field                  = readr::col_character(),
  is_longitudinal                         = readr::col_logical(),
  has_repeating_instruments_or_events     = readr::col_logical(),
  surveys_enabled                         = readr::col_logical(),
  scheduling_enabled                      = readr::col_logical(),
  record_autonumbering_enabled            = readr::col_logical(),
  randomization_enabled                   = readr::col_logical(),
  ddp_enabled                             = readr::col_logical(),
  project_irb_number                      = readr::col_character(),
  project_grant_number                    = readr::col_character(),
  project_pi_firstname                    = readr::col_character(),
  project_pi_lastname                     = readr::col_character(),
  display_today_now_button                = readr::col_logical(),
  missing_data_codes                      = readr::col_character(),
  external_modules                        = readr::col_character(),
  bypass_branching_erase_field_prompt     = readr::col_character(),
  .default                                = readr::col_character()
)

names(col_types$cols)
#>  [1] "project_id"                          "project_title"                      
#>  [3] "creation_time"                       "production_time"                    
#>  [5] "in_production"                       "project_language"                   
#>  [7] "purpose"                             "purpose_other"                      
#>  [9] "project_notes"                       "custom_record_label"                
#> [11] "secondary_unique_field"              "is_longitudinal"                    
#> [13] "has_repeating_instruments_or_events" "surveys_enabled"                    
#> [15] "scheduling_enabled"                  "record_autonumbering_enabled"       
#> [17] "randomization_enabled"               "ddp_enabled"                        
#> [19] "project_irb_number"                  "project_grant_number"               
#> [21] "project_pi_firstname"                "project_pi_lastname"                
#> [23] "display_today_now_button"            "missing_data_codes"                 
#> [25] "external_modules"                    "bypass_branching_erase_field_prompt"

Created on 2023-02-03 with reprex v2.0.2

And here are the actual columns returned:

library(RCurl)

result <- postForm(
  uri='https://redcap.wustl.edu/redcap/.../api/',
  token='XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX',
  content='project',
  format='csv',
  returnFormat='json'
)

names(
  readr::read_csv(
    I(result),
    show_col_types = FALSE
  )
)
#>  [1] "project_id"                   "project_title"               
#>  [3] "creation_time"                "production_time"             
#>  [5] "in_production"                "project_language"            
#>  [7] "purpose"                      "purpose_other"               
#>  [9] "project_notes"                "custom_record_label"         
#> [11] "secondary_unique_field"       "is_longitudinal"             
#> [13] "surveys_enabled"              "scheduling_enabled"          
#> [15] "record_autonumbering_enabled" "randomization_enabled"       
#> [17] "ddp_enabled"                  "project_irb_number"          
#> [19] "project_grant_number"         "project_pi_firstname"        
#> [21] "project_pi_lastname"          "display_today_now_button"

Created on 2023-02-03 with reprex v2.0.2

A setdiff() would show the missing columns are those listed in the part of the message produced by read_csv():

  1. has_repeating_instruments_or_events
  2. missing_data_codes
  3. external_modules
  4. bypass_branching_erase_field_prompt

The actual stopping error happens later in redcap_metadata_internal() when there is an attempt to check d_proj$has_repeating_instruments_or_events[1] which is NULL on account of not having been returned by the API.

I have a solution that produces the expected d_proj object with NA for the missing columns, but before I initiated a pull request, I wanted to see if this was something you wanted to support.

wibeasley commented 1 year ago

@the-mad-statter, I like that idea and would love that PR. Even if I wasn't interested in supporting v7 (and honestly, I'm only lukewarm about it), I really like the idea of gracefully growing.

Please make sure that before has_repeating_instruments_or_events is referenced, the code checks to see if it exists. If it doesn't exist, throw an error (with stop()) that their version of REDCap apparently doesn't support repeated instruments/events.

the-mad-statter commented 1 year ago

I can add the stop(), but it's not that v7 REDCap doesn't support repeated instruments/events but that the API neglects to report on it via the project info endpoint.

As a work around consider the idea to read a single record and check the returned field names for either "redcap_repeat_instrument" or "redcap_repeat_instance" to determine what the has_repeating_instruments_or_events value should have been had the API reported on it and set it as appropriate.

wibeasley commented 1 year ago

it's not that v7 REDCap doesn't support repeated instruments/events but that the API neglects to report on it via the project info endpoint.

I understand your distinction now. But I think I'm okay lumping those two cases together for a version that was released 4+ years ago.

As a work around consider the idea to read a single record and check the returned field names...

I'll try to be flexible, but in my experience, that strategy makes things less stable. I prefer to use explicitly declared values to detect the server's capabilities. If I go the indirect/infer approach, I'm worried about not accounting for all the possible corner cases.

the-mad-statter commented 1 year ago

Sure thing, I don't particularly like the work around either. Therefore, I will add the stop() for a PR and call it a day. Should I use main or dev?

wibeasley commented 1 year ago

Slightly prefer pulling into dev, but I can work with either.

Hope things are good in St. Louis. Tell me if you're ever back in Oklahoma.