ctsit / redcapcustodian

Simplified, automated data management on REDCap systems
Other
12 stars 6 forks source link

Add get_project_life_cycle() #106

Closed pbchase closed 1 year ago

pbchase commented 1 year ago

Add get_project_life_cycle() Add log_event_tables.

The script has a flaw though. It is behaved in testing, but against prod, this code fails:

library(tidyverse)
library(lubridate)
library(dotenv)
library(redcapcustodian)
library(rcc.ctsit)
library(DBI)
library(RMariaDB)

init_etl("backfill_redcap_summary_metrics_project_history")
# Get project creation, deletion, and move-to-prod history from the redcap event log

rc_conn <- connect_to_redcap_db()

system.time({
  project_life_cycle <- get_project_life_cycle(rc_conn, read_cache = F)
})

project_life_cycle %>%
  distinct(project_id) %>%
  nrow()

tbl(rc_conn, "redcap_projects") %>%
  summarise(maximum_project_id = max(project_id))
> system.time({
+   project_life_cycle <- get_project_life_cycle(rc_conn, read_cache = F)
+ })
   user  system elapsed 
  0.934   0.100  56.620 
Error in context[[1L]] : subscript out of bounds
> project_life_cycle %>%
+   distinct(project_id) %>%
+   nrow()
[1] 12742
> tbl(rc_conn, "redcap_projects") %>%
+   summarise(maximum_project_id = max(project_id))
# Source:   SQL [1 x 1]
# Database: mysql  [redcapuser@ahcmysqldf1.ahc.ufl.edu:NA/ctsi_redcap]
  maximum_project_id
               <int>
1              13355

And it is missing about 600 projects.

pbchase commented 1 year ago

The automated test is failing for want of the here package. I should probably move that code out of the function and require a cache file be specified when called.

pbchase commented 1 year ago

seems to be fixed now. I am adding test data. probably be done Monday evening

pbchase commented 1 year ago

@ChemiKyle, this is now ready to review