Openscapes / 2021-fdd

Openscapes Champions Cohort for Fisheries Dependent Data Users (FDD)
https://openscapes.github.io/2021-fdd/
5 stars 9 forks source link

Encrypting confidential data for use in a Shiny app #32

Open BLowman1 opened 3 years ago

BLowman1 commented 3 years ago

I built a Shiny app to visualize data for the a stock assessment working group. Its purpose is to show the spatial distribution of landings per unit effort (LPUE) from multiple data sets (dealer/VTR, Observer Program, Study Fleet) at various time scales. It’s a pretty basic app, but it facilitates a crucial part of the data exploration process that 1) is sometimes overlooked and 2) is important for helping us understand what might be driving differences in catch rates.

The major impediment to getting the app up and running for use by the Working Group and a small group of collaborators was how to handle data confidentiality. Because the app is hosted on a public server (shinyapps.io), I have to be careful about how the data are stored and accessed. At first I simply required a password to access the visualizations, and that is enough to keep a casual observer who may have stumbled across the URL away. However, storing the raw data file on a public server is questionable at best, so I encrypted the data and changed the functionality of the password from simply turning on and off the plots to unencrypting the data. In theory, this is straightforward and simple. Of course, we’ve all (I think) had a lovely bit of code that we tried to update and killed it in the process. After several days and consulting some other experts, I arrived at a working solution. The following is an outline of the major components of the security aspects of the code. I’ll assume that the reader has a general understanding of writing Shiny apps.

Data wrangling and encryption

I prepared and formatted the data in a separate script from the app itself. There are two reasons for doing this: 1) reading in a flat pre-processed data file is faster in the app, and 2) this allows me to encrypt the data and removes the need to allow users direct access to databases.

After setting up and saving the data, I used the sodium package to encrypt the file. There are 5 main components to the encryption process: specify a password (this is a human-readable character string), translate the password into a machine-readable private key, generate a public key based on the private key, use the public key to encrypt the data, and finally, save the encrypted data as an R data object.

library(sodium)

#Specify password
mysupersecretpassword <- "Riscool"

#Read data
data_unencrypted <- read.csv("mydata.csv")

#Private key
key_private <- sha256(charToRaw(mysupersecretpassword))
paste("Private Key:", paste(key_private, collapse = " "))

#Public key
key_public  <- pubkey(key_private)
paste("Public Key:", paste(key_public, collapse = " "))

#Encrypt data
data_encrypted <- simple_encrypt(serialize(data_unencrypted, NULL), key_public)

#Save data
saveRDS(data_encrypted, "mydata_encrypted.rds")

In the app.R script (outside of the server), now we need to read in the encrypted data object, hard code in the public key, and set up an empty placeholder data frame for the unencrypted data.

Password Input

The password entry on the UI end is done in the usual way via passwordInput(). On the server side we need to create a private key, which is a reactive object based on the input password. This is where things can get a little bit hairy because this private key does two things: it is translated and checked against the public key to determine if unencryption should occur, and then it used within the simple_decrypt() function to unencrypt the data. Exactly how this is carried out is beyond the scope of this post, but suffice it to say that IF the translated private key matches the hard-coded public key, THEN we would like to proceed to unencrypt the data using the “raw” private key.

server <- function(input, output, session) {
## translate the human and shiny readable password into machine junk
## then translate from the private to public version
 key_private <- reactive({
   kp <- sha256(charToRaw(input$txt_password))
   paste(pubkey(kp), collapse = " ")
 })  

 ## save the machine and sodium readable password
 kpRaw <- reactive({
   sha256(charToRaw(input$txt_password))
 })

## Observe submit button (runs once when submit button is clicked)
observeEvent(input$button, {
 ## check if private key provided is correct
  if(key_private() == key_public) {
    output$pswd_note <- renderText("Password accepted. Do not share the plots with unauthorized users.")
    ## unencrypt the data and make it usable
    data_unencrypted <- as.data.frame(unserialize(simple_decrypt(data_encrypted, kpRaw())))
  } else {
    output$pswd_note <- renderText("Incorrect password. Please try again.")
  }

## insert code for filtering data and making some outputs here 

}) # close the Observe Event function
}) # close the server function

General Shiny Troubleshooting

Shiny apps can have a lot of moving parts, so isolating things is key. Three things I find really helpful:

Comment out the server code and run the app to make sure the UI looks like what you expect to see. Create an input list object with static values with the same names as all the UI inputs. Use this to check the code on the server side. Create a “dummy” app that does as little as possible to work out the problematic section. For this example, my dummy app had two inputs (password and action button) and two outputs (a text string to indicate whether the password was correct, and a table displaying the first several lines of the unencrypted dataframe). Finally, check carefully where all those reactive brackets open and close! Happy coding!