Epiconcept-Paris / STRAP-epiuf

Utility function package for STRAP
0 stars 0 forks source link

REDCap codebook download & dictionary creator #111

Open EPI-JHS opened 6 months ago

EPI-JHS commented 6 months ago

LMC & JHS discussed a potential epiuf function to compliment the existing REDCap API function family:

extractREDCapDict()

Unclear whether such a function would be useful to sufficient proportion of projects/team members, so not an actively planned enhancement.

Example code for achieving the basic functionality:

# Code for extracting data dictionaries from REDCap and formatting into dicts and dicos per STRAP
#
#
# Intended purpose:
# Downloads the raw dictionary from REDCap for a project via API
# Formats the JSON list response to create two data frames (one dict and one dico)
# Creates an Excel in line with the expected formatting found in STRAP data dictionaries
#
# Remaining work:
# Requires STRAP-ification, in particular: 
#   - Refining data dictionary formatting
#   - Using epiuf functions rather than external packages
#   - STRAP/epiuf file path designations
# Common dicos/fields could be identified and automatically ID'd (e.g., rather than 'dico_1', 'yndk')

# Libraries used, for reference
library(openxlsx)
library(httr)
library(dplyr)
library(epiuf)

# Grab API secret for the REDCap project
token <- grabKeyring("GE")

# API call
url <- "https://extranet.who.int/edcrc/api/"
formData <- list("token"=token,
                 content='metadata',
                 format='json',
                 returnFormat='json'
)

response <- httr::POST(url, body = formData, encode = "form")

# Long list created with each element being a field in the dictionary
REDCapDict <- httr::content(response)

# If you have no API key pointing to a REDCap project with a dict, GE's dict can be loaded here to test
#exampledict_path <- "G:/Shared drives/1_EPI_WHO-EURO hospital analysis/New Stata (Angie & Sandra) version in progress/documents/GE/example_dict"
#REDCapDict <- readRDS(exampledict_path)

# Some fields are data type 'yesno'. They are empty for select_choices, though always take the value
# of either yes or no. Some projects will not have any fields in yesno format, though its necessary to check
# as later code produces an incomplete dico without this step where these types do exist.
for(item in 1:length(REDCapDict)) {
  if(REDCapDict[[item]]$field_type=="yesno") {
    REDCapDict[[item]]$select_choices_or_calculations <- "0, No | 1, Yes"
  }
}

# Convert the list of dictionaries to a dataframe
dict_list <- lapply(REDCapDict, function(x) as.data.frame(t(x), stringsAsFactors = FALSE))
dict_df <- do.call(rbind, dict_list)

# View the resulting dataframe
rm(dict_list,response, formData, token, url)

# Initialize an empty dataframe
dico_list <- data.frame()

# Loop over factors in the dictionary to extract the relevant dicos (held in select_choices)
for(item in 1:length(REDCapDict)) {
  if((REDCapDict[[item]]$field_type=="radio"|REDCapDict[[item]]$field_type=="yesno")){
    field <- REDCapDict[[item]]$field_name
    dico <- REDCapDict[[item]]$select_choices_or_calculations
    # Create a dataframe for the current item
    dico_df <- data.frame(field = field, dico = dico, stringsAsFactors = FALSE)
    # Append the dataframe to the existing dico_list
    dico_list <- rbind(dico_list, dico_df)
  }
}

# View the resulting dico_list
rm(REDCapDict,dico_df)

# Make a list of all unique dicos
unique_dicos <- unique(dico_list$dico)

# Convert to df
unique_dicos <- lapply(unique_dicos, function(x) as.data.frame(t(x), stringsAsFactors = FALSE))
unique_dicos <- do.call(rbind, unique_dicos)

# Create dummy ID for each unique dico in the dict
unique_dicos$dico_id <- paste0("dico_",seq.int(nrow(unique_dicos)))
unique_dicos$dico <- unique_dicos$V1

# Initialize an empty dataframe for the long format
dico_df <- data.frame()

# Extract all dico values from the , and | separated list
for (i in 1:nrow(unique_dicos)) {
  # Extracting field name and dict values
  dico_id <- unique_dicos$dico_id[i]
  dict_values <- strsplit(unique_dicos$dico[i], "\\|")[[1]]

  # Iterate over each dict value
  for (dict_value in dict_values) {
    # Use a regular expression to split at the comma following a numeric value
    orig_value <- unique_dicos$dico[i]
    split_values <- unlist(strsplit(dict_value, "(?<=\\d),", perl=TRUE))
    code <- trimws(split_values[1])
    value <- trimws(split_values[2])

    # Create a dataframe for the current split
    temp_df <- data.frame(orig_value=orig_value,dico_id = dico_id, code = code, value = value, stringsAsFactors = FALSE)

    # Append the dataframe to the existing dico_df
    dico_df <- rbind(dico_df, temp_df)
  }
}

# Saving space
rm(dico_list,temp_df,code,dico,dict_value,dict_values,field,dico_id,i,item,split_values,value,orig_value,unique_dicos)

# Take just keys (id and orig value to match back to dict_df)
dico_df_unique <- dico_df %>%
  select(dico_id, orig_value) %>%
  distinct()

# Join keys with dict_df

# Convert list to character if necessary
dict_df$select_choices_or_calculations <- sapply(dict_df$select_choices_or_calculations, as.character)
dico_df_unique$orig_value <- sapply(dico_df_unique$orig_value, as.character)

# Perform a left join with dplyr
dict_df <- dict_df %>% 
  left_join(dico_df_unique[, c("orig_value", "dico_id")], 
            by = c("select_choices_or_calculations" = "orig_value"), relationship = "many-to-one")

# Take just the columns of interest
dict_df <- dict_df %>%
  select(field_name, field_type, dico_id, field_label)

# Take just the columns of interest
dico_df <- dico_df %>%
  select(dico_id, code, value)

# Rename field type to STRAP naming convention for REDCap data types
dict_df$field_type <- with(dict_df, case_when((!is.na(field_type)&field_type=="radio") ~ "factor"
                                 ,(!is.na(field_type)&field_type=="yesno") ~ "factor"
                                 ,(!is.na(field_type)&field_type=="dropdown") ~ "factor"
                                 ,(!is.na(field_type)&field_type=="calc") ~ "numeric"
                                 ,(!is.na(field_type)&field_type=="date") ~ "date"
                                 ,(!is.na(field_type)&field_type=="text") ~ "character"
                                 ))

# Saving space
rm(dico_df_unique)

# Create a new workbook
wb <- createWorkbook()

# Add sheets to the workbook
addWorksheet(wb, "dicos")
addWorksheet(wb, "dictionary")
addWorksheet(wb, "actions")

# Write data to sheets
writeData(wb, sheet = "dicos", dico_df)
writeData(wb, sheet = "dictionary", dict_df)

# Save the workbook
saveWorkbook(wb, "example_dict.xlsx", overwrite = TRUE)
loremerdrignac commented 6 months ago

This is a great idea, thank you James! :)

We also mentioned that it could be nice to do something similar for Voozanoo extractions.

If feasible, we could also think of a function that could reformat an existing dictionnary Excel spreadsheet provided by countries into standardised STRAP dictionnaries.

EPI-JHS commented 6 months ago

Definitely, I think including voozanoo and kobo toolbox would be fantastic. It would certainly ease downloading and reporting for my projects, one of which uses data uploaded to across all 3 platforms!

I like that idea! I think this would have to start with designing a kind of raw template to partially standardise the format. I also looked around for a package that automatically creates a dictionary based on the values found within a raw data set, though couldn't find anything. This could be an interesting method, as we could derive the dictionary from the received values (e.g., this column only ever takes the value 0, 1 etc), create a basic dict, and then combine it with that received from the country to complete/correct it. In any case I like your idea because this currently takes a while and is very manual :)

loremerdrignac commented 6 months ago

Thank you James! Just to add that this is @epi-jhd 's idea ;-) and that if this could save time to the team already now, maybe we would need to prioritise this then!!

epi-gde commented 6 months ago

Hum that would be great to automaticaly import voozanno data ! And dictionnary could be get from voozanno export Knowing that several projects use it, it may be a priority 😉

Sent from my iPad

Le lun. 8 janv. 2024 à 12:08, Lore Merdrignac @.***> a écrit :

Thank you James! Just to add that this is @epi-jhd https://github.com/epi-jhd 's idea ;-) and that if this could save time to the team already now, maybe we would need to prioritise this then!!

— Reply to this email directly, view it on GitHub https://github.com/Epiconcept-Paris/STRAP-epiuf/issues/111#issuecomment-1880796979, or unsubscribe https://github.com/notifications/unsubscribe-auth/AXARLUV2DPJQIQKGKQON5P3YNPHUHAVCNFSM6AAAAABBONLTFOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOBQG44TMOJXHE . You are receiving this because you are subscribed to this thread.Message ID: @.***>