OuhscBbmc / REDCapR

R utilities for interacting with REDCap
https://ouhscbbmc.github.io/REDCapR
Other
112 stars 45 forks source link

`redcap_read()` returns error when 847+ records requested #512

Closed ocelhay closed 8 months ago

ocelhay commented 8 months ago

Our targets pipeline has suddenly stopped to run last week, while we hadn't made any change in the R code or the REDCap instance.

It appears that, in our local session and on a docker container, redcap_read with 847 or more records has stopped working.

FullsessionInfo() at the end:

This is not related to specific ids, since we take different sample and 846 records always work while 847 will throw an error.

REDCapR::redcap_read(
  redcap_uri = Sys.getenv("ABCDREDCAPURL"),
  token = Sys.getenv("ABCDREDCAPTOKEN"),
  fields = c(
    "id_redcap"
  ),
  records = sample(ids, 846)
)

Returns an expected result:

$data
# A tibble: 17,630 × 4
   id_redcap        redcap_event_name        redcap_repeat_instrum…¹ redcap_repeat_instance
   <chr>            <chr>                    <lgl>                   <lgl>                 

**REDACTED (PII)**

# ℹ 17,620 more rows
# ℹ abbreviated name: ¹​redcap_repeat_instrument
# ℹ Use `print(n = ...)` to see more rows

$success
[1] TRUE

$status_codes
[1] "200; 200; 200; 200; 200; 200; 200; 200; 200"

$outcome_messages
[1] "2,045 records and 4 columns were read from REDCap in 1.0 seconds.  The http status code was 200.; 2,106 records and 4 columns were read from REDCap in 1.0 seconds.  The http status code was 200.; 2,068 records and 4 columns were read from REDCap in 1.1 seconds.  The http status code was 200.; 2,096 records and 4 columns were read from REDCap in 1.0 seconds.  The http status code was 200.; 2,092 records and 4 columns were read from REDCap in 1.1 seconds.  The http status code was 200.; 2,087 records and 4 columns were read from REDCap in 1.1 seconds.  The http status code was 200.; 2,079 records and 4 columns were read from REDCap in 1.0 seconds.  The http status code was 200.; 2,071 records and 4 columns were read from REDCap in 1.0 seconds.  The http status code was 200.; 986 records and 4 columns were read from REDCap in 1.1 seconds.  The http status code was 200."

$records_collapsed
**REDACTED (PII)**

$fields_collapsed
[1] "id_redcap"

$forms_collapsed
[1] ""

$events_collapsed
[1] ""

$filter_logic
[1] ""

$datetime_range_begin
[1] NA

$datetime_range_end
[1] NA

$elapsed_seconds
[1] 24.73854
REDCapR::redcap_read(
  redcap_uri = Sys.getenv("ABCDREDCAPURL"),
  token = Sys.getenv("ABCDREDCAPTOKEN"),
  fields = c(
    "id_redcap"
  ),
  records = sample(ids, 847)
)

returns You do not have permissions to use the API:

46,515 variable metadata records were read from REDCap in 1.3 seconds.  The http status code was 200.
The data dictionary describing 40,272 fields was read from REDCap in 4.4 seconds.  The http status code was 200.
402 instrument metadata records were read from REDCap in 0.7 seconds.  The http status code was 200.
1 rows were read from REDCap in 0.7 seconds.  The http status code was 200.                
24 data access groups were read from REDCap in 0.7 seconds.  The http status code was 200. 
The REDCapR read/export operation was not successful.  The error message was:
<?xml version="1.0" encoding="UTF-8" ?><hash><error>You do not have permissions to use the API</error></hash>

$data
# A tibble: 0 × 0

$success
[1] FALSE

$status_codes
[1] 403

$outcome_messages
[1] "The initial call failed with the code: 403."

$records_collapsed
[1] "failed in initial batch call"

$fields_collapsed
[1] "failed in initial batch call"

$forms_collapsed
[1] "failed in initial batch call"

$events_collapsed
[1] "failed in initial batch call"

$filter_logic
[1] "failed in initial batch call"

$datetime_range_begin
[1] "failed in initial batch call"

$datetime_range_end
[1] "failed in initial batch call"

$elapsed_seconds
[1] 9.083597

Session Info

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.0

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] tarchetypes_0.7.9 dplyr_1.1.3       targets_1.3.2    

loaded via a namespace (and not attached):
 [1] utf8_1.2.4          generics_0.1.3      tidyr_1.3.0         renv_1.0.3         
 [5] xml2_1.3.5          paws.common_0.6.2   hms_1.1.3           digest_0.6.33      
 [9] magrittr_2.0.3      aws.s3_0.3.21       aws.signature_0.6.0 jsonlite_1.8.7     
[13] processx_3.8.2      backports_1.4.1     ps_1.7.5            httr_1.4.7         
[17] purrr_1.0.2         fansi_1.0.5         codetools_0.2-19    RApiSerialize_0.1.2
[21] REDCapR_1.1.9005    cli_3.6.1           rlang_1.1.1         crayon_1.5.2       
[25] bit64_4.0.5         base64enc_0.1-3     withr_2.5.1         yaml_2.3.7         
[29] qs_0.25.5           tools_4.3.1         parallel_4.3.1      tzdb_0.4.0         
[33] checkmate_2.3.0     base64url_1.4       curl_5.1.0          vctrs_0.6.4        
[37] R6_2.5.1            lifecycle_1.0.3     bit_4.0.5           fs_1.6.3           
[41] stringfish_0.15.8   vroom_1.6.4         pkgconfig_2.0.3     callr_3.7.3        
[45] RcppParallel_5.1.7  pillar_1.9.0        data.table_1.14.8   glue_1.6.2         
[49] Rcpp_1.0.11         xfun_0.39           tibble_3.2.1        tidyselect_1.2.0   
[53] rstudioapi_0.15.0   paws.storage_0.4.0  knitr_1.44          igraph_1.5.1       
[57] readr_2.1.4         compiler_4.3.1     
wibeasley commented 8 months ago
  1. That is weird. ~800 patients records (with 2,045 total events & repeated instruments) shouldn't create a problem.

  2. But I see that dictionary is huge --40K fields? Is that right?

  3. Were new patients added in the meantime?

  4. I'm skeptical this is related to the R version, since the server is throwing a 403 error.

  5. What happens when the batch_size parameter is reduced from the default of 200?

ocelhay commented 8 months ago

Thanks for the quick reply @wibeasley

  1. That is weird. ~800 patients records (with 2,045 total events & repeated instruments) shouldn't create a problem.

Yes, we previously routinely queried 11,000 + records without an issue.

  1. But I see that dictionary is huge --40K fields? Is that right?

Yes, that is right -unfortunately-

  1. Were new patients added in the meantime?

No new patients.

  1. What happens when the batch_size parameter is reduced from the default of 200?

It's the same behavior with 10 or 2000 batch_size.

  1. I'm skeptical this is related to the R version, since the server is throwing a 403 error.

Yes, we are investigating the REDCap instance. Maybe we should start that first.

wibeasley commented 8 months ago

Yes, we are investigating the REDCap instance. Maybe we should start that first.

Maybe, but I'm not sure I have advice even where to start looking at the server. Scarcity of disk space and/or ram?

Tell me how that goes. I have some tricks to go upstream of REDCapR and even R.

I'm surprised/bummed batch_size didn't work.

januz commented 8 months ago

@wibeasley

Scarcity of disk space and/or ram?

That's exactly what we found out today, the server was running out of disk space. Thanks for your quick response and sorry for the "false alarm"

wibeasley commented 8 months ago

ha. Good catch. It's weird that 846 was repeated the magic number. BTW, I liked the random sampling approach.

oh hey, @januz.