Sage-Bionetworks / cleanAD

Tools for cleaning and organizing study data for the AD Knowledge Portal.
Other
0 stars 1 forks source link

cleanAD

Tools for cleaning and organizing study data for the AD Knowledge Portal.

Installation

You can install the cleanAD package from GitHub using the remotes package.

remotes::install_github("Sage-Bionetworks/cleanAD")

Scripts

Generate Specimen ID Table

The R script in inst/scripts/generate-table-ad.R will generate and upload a specimen ID table from metadata files and annotations in the format needed by the dccvalidator for checking metadata files against existing specimen and individual IDs. This script requires a configuration file located at inst/config.yml. Currently, the script does not allow for configuration files that are not installed with the package.

Config Options

Using the Script

There are 3 ways to run the script: Rscript, bash script, docker.

Note: This is not recommended for running locally unless testing on dummy data. You should always run this script in a safe computing environment to ensure no PHI is downloaded to your local system.

Rscript

Ensure cleanAD is installed in your local system and you are able to run scripts via Rscript. You should be able to run the script with the following:

Rscript ./cleanAD/inst/scripts/generate-table-ad.R --config <config to use (e.g. default)> --auth_token <Synapse personal access token or have local .synapseConfig>
Bash

If you are on a Linux or Mac computer, you can use the included bash script to launch the R script, update_table.sh. Ensure cleanAD is installed in your local system and you are able to run scripts via Rscript. You should be able to run the script with the following:

./cleanAD/update_table.sh <config to use (e.g. default)> <Synapse personal access token or have local .synapseConfig>
Docker

A docker image has been created for running this script. You can use the docker by either building the image yourself with the included Dockerfile or pulling the sagebionetworks/cleanad image from the cloud. The DockerHub image is automatically built after pushes to the main branch in this repository, although there may be a lag of up to 6 hours before the image is updated.

docker pull sagebionetworks/cleanad:latest
docker run --rm --entrypoint "./cleanAD/update_table.sh" sagebionetworks/cleanad:latest <config to use (e.g. default)> <Synapse personal access token or have local .synapseConfig>

To build the image locally, follow the steps below.

git clone https://github.com/Sage-Bionetworks/cleanAD.git
cd cleanAD
docker build -t cleanad .
Running as a Scheduled Job in the Sage AWS Service Catalog

Provide a Synapse PAT to the Scheduled Job secrets field named "SYNAPSE_AUTH_TOKEN":"<your-PAT-here>".

Pull the public docker image and provide the following command to execute in the container (ad and TRUE are arguments to config and as_scheduled_job respectively).

./cleanAD/scheduled_job_update_table.sh ad TRUE