ctsit / redcapcustodian

Simplified, automated data management on REDCap systems
Other
13 stars 6 forks source link

Support Credential Management to Enable Multi-instance Operations #41

Open ChemiKyle opened 2 years ago

ChemiKyle commented 2 years ago

Note: this issue is describing two features, credential management is required for multi-instance operations. I have already written most of the former, the latter I have a handful of scripts to do but is much more open ended in general.

The ability to shuttle data between different projects (possibly on different servers) is cumbersome with the .env model, requiring supporting multiple variables for different project tokens, and different .env files for multiple REDCap instances.

We would benefit from the ability to support the use of arbitrary credentials to be passed through to REDCapR's API actions. REDCapR has something to do this but it's not an out of the box solution. The idea of having a single object that represents credentials to a project on a specific host is useful however.

Credential storage model

I propose we store credentials in a database table with the following columns, they are a superset of those contained in REDCapR's implementation:

Functions which use these credential objects should only need redcap_uri and token, and should be compatible with REDCapR's credential objects.

Proposed schema, column types copied from REDCap tables where appropriate:

CREATE TABLE IF NOT EXISTS `credentials` (
  `redcap_uri` TEXT NOT NULL,
  `server_short_name` varchar(128) COLLATE utf8mb4_unicode_ci NOT NULL,
  `username` varchar(191) NOT NULL,
  `project_id` int(10) NOT NULL,
  `project_display_name` TEXT NOT NULL,
  `project_short_name` varchar(128) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `token` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
  `comment` varchar(256) COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Separation of credentials and production consideration

Production scripts would read from a central db with tokens for a special API user.

Individual machines would use a local (likely SQLite) db for their own credentials. Regarding the latter, I have experimental functions that scrape a REDCap instance for all of a specified user's existing tokens, and another that creates tokens on a server for a specified list of project IDs.

Easier testing

Combined with functions that create projects and API tokens, this should also let us recreate production servers easily on freshly built local containers and run (automated!) tests.

Caveat

Currently I'm not privvy to API methods to gather External Module settings, I am able to do this with separate conn objects, but this requires specialized or multiple .env files in my current implementation. I also might need access to the server's filesystem to read module config files (i.e. to detect if a field is an event and requires migration to fit the target project).


Examples

Syncing data dictionary from a prod project to a training one

Here is an example of keeping project metadata (data dictionary) synced between 2 projects:

# simplified, minimal sync that uses credential objects
sync_metadata_with_credentials <- function(source_credentials, target_credentials) {
  source_metadata <- REDCapR::redcap_metadata_read(
    redcap_uri = source_credentials$redcap_uri,
    token = source_credentials$token
  )
  target_metadata <- REDCapR::redcap_metadata_read(
    redcap_uri = target_credentials$redcap_uri,
    token = target_credentials$token
  )
  if (!identical(target_metadata$data, source_metadata$data)) {
    REDCapR::redcap_metadata_write(
      ds = source_metadata,
      redcap_uri = target_credentials$redcap_uri,
      token = target_credentials$token
    )
  } else {
    print("metadata already up to date")
  }
}

# use dplyr to locate proper credential sets
source_credentials <- tbl(credential_db_conn, "credentials") %>%
  filter(username == "etl_user") %>%
  filter(project_short_name = "main study") %>%
  filter(server_short_name = "prod") %>%
  collect() %>%
  unnest()

# examine credential object
t(source_credentials)
#                      [,1]                              
# project_id           "15"                              
# username             "etl_user"                           
# token                "MELLON"
# project_display_name "Main Study Long Name in REDCap UI"                    
# project_short_name   "main study"                    
# redcap_uri           "http://redcap-production-server.university.edu/api/"     
# server_short_name    "prod"         
# comment              ""

target_credentials <- tbl(credential_db_conn, "credentials") %>%
  filter(username == "admin") %>%
  filter(project_short_name = "main study") %>%
  filter(server_short_name = "training") %>%
  collect() %>%
  unnest()

sync_metadata_with_credentials(source_credentials, target_credentials)

For reference, here is an implementation that requires a special .env object with prefixes for SOURCE and TARGET information:

# a special env 
sync_metadata_with_env <- function(project) {
  PROJECT <- toupper(project)
  # Data dictionary
  source_metadata <- REDCapR::redcap_metadata_read(
    redcap_uri = Sys.getenv("SOURCE_URI"),
    token = Sys.getenv(paste0("SOURCE_", PROJECT, "_TOKEN"))
  )
  target_metadata <- REDCapR::redcap_metadata_read(
    redcap_uri = Sys.getenv("TARGET_URI"),
    token = Sys.getenv(paste0("TARGET_", PROJECT, "_TOKEN"))
  )
  if (!identical(target_metadata$data, source_metadata$data)) {
    REDCapR::redcap_metadata_write(
      ds = source_metadata,
      redcap_uri = Sys.getenv("TARGET_URI"),
      token = Sys.getenv(paste0("TARGET_", PROJECT, "_TOKEN"))
    )
  } else {
    print(paste(project, "is already up to date"))
  }
}