PediatricOpenTargets / OpenPedCan-api

2 stars 7 forks source link

OpenPedCan-api

GitHub Super-Linter

OpenPedCan-api implements OpenPedCan (Open Pediatric Cancers) project public API (application programming interface) to transfer OpenPedCan-analysis results and plots via HTTP, which is publicly available at https://openpedcan-api.d3b.io/__docs__/.

1. API endpoint specifications

https://openpedcan-api-qa.d3b.io/__docs__/ specifies the following API endpoint attributes.

1.1. includeTumorDesc parameter in /tpm/* endpoints

includeTumorDesc parameter determines how independent primary and relapse tumor samples should be handled, which takes one of the following four values.

1.2. rankGenesBy parameter in /dge/top-gene-disease-gtex-diff-exp/* endpoints

rankGenesBy parameter determines how differentially expressed genes are ranked for each disease.

2. OpenPedCan-api server deployment

OpenPedCan-api server is deployed using Amazon Web Services (AWS). OpenPedCan-api HTTP server is deployed using Amazon Elastic Container Registry (ECR), Elastic Container Service (ECS), and Fargate. The HTTP server queries OpenPedCan-api database server, and the database server is deployed using Amazon Relational Database Service (RDS).

https://openpedcan-api.d3b.io/__docs__/ is the URL of OpenPedCan-api PRD (production) server. The PRD server will only deploy the latest release of the repository.

https://openpedcan-api-qa.d3b.io/__docs__/ is the URL of OpenPedCan-api QA (quality assurance) server. The QA server will only deploy the latest commit to the main branch of the repository.

https://openpedcan-api-dev.d3b.io/__docs__/ is the URL of OpenPedCan-api DEV (development) server. The DEV server will deploy the latest commit to any branch of the repository.

OpenPedCan-api HTTP server is deployed with the following steps, according to comments and messages by @blackdenc .

OpenPedCan-api database server is deployed with the following steps:

3. Test run OpenPedCan-api server locally

Test run OpenPedCan-api server with the following steps:

Note that this test run procedure has only been tested on linux operating system, with the following environment.

Working directory is the git repository root directory, i.e. the directory
that contains the .git directory of the repository.

ubuntu 20.04
docker 20.10
docker-compose 1.29.2
curl 7.79
git 2.25
ImageMagick 6.9
shellcheck 0.7
sha256sum 8.30
md5sum 8.30
R 4.1
R package readr 2.0.2
R package jsonlite 1.7.2
R package lintr 2.0.1
R package httr 1.4.2
R package testthat 3.0.4
R package glue 1.4.2
R package stringr 1.4.0

@brianghig found that OpenPedCan-api server cannot be run on MacOS 12.5. For more details, see https://github.com/PediatricOpenTargets/OpenPedCan-api/pull/77.

3.1. git clone OpenPedCan-api repository

# Change URL if a fork repo needs to be used
git clone https://github.com/PediatricOpenTargets/OpenPedCan-api.git

cd OpenPedCan-api

git checkout -t origin/the-branch-that-needs-to-be-tested
# Optionally, checkout a commit with the following command
#
# git checkout COMMIT_HASH_ID

3.2. Prepare Docker environment files

Prepare the following OpenPedCan-api Docker environment files for building database and running database and HTTP servers locally.

The following ../OpenPedCan-api-secrets paths are relative to the root directory of this git repository. For example, the structure of the parent directory of OpenPedCan-api-secrets directory may look like the following:

.
├── OpenPedCan-api
└── OpenPedCan-api-secrets

These environment files pass secret information to docker container environment. Although local development can use plain text in code and configurations without worrying about any security issues, these environment files are used to emulate production environment, so that locally developed systems can be deployed in production environment more straightforwardly. The consistent secret handling method in local and production environment is also more straightforward to OpenPedCan-api developers. These environment files also set the same environment variable values for different components of OpenPedCan-api, so that developing and testing can be more straightforward.

3.3. Run static code analysis

./tests/run_linters.sh

If there is any syntax error, comment in the GitHub pull request with the full error messages.

./tests/run_linters.sh analyzes R, Docker, and shell files using R lintr, hadolint, and shellcheck respectively.

3.4. (Optional) Build OpenPedCan-api database locally

Use the following bash command to build OpenPedCan-api database locally. This step takes about 40GB memory and 500GB disk space.

./db/build_db.sh

./db/build_db.sh runs the following steps:

Note for developers: To build a small database, with only a few arbitrarily selected genes and all samples, for efficient development and testing, run DOWN_SAMPLE_DB_GENES=1 ./db/build_db.sh.

3.5. Build and run OpenPedCan-api HTTP server and database server docker images

Use the following bash commands to build and run OpenPedCan-api HTTP server and database server docker images.

3.6. Test OpenPedCan-api server

Test the running server with the following command.

./tests/run_tests.sh

./tests/run_tests.sh sends multiple HTTP requests to localhost:8082 by default, with the following steps.

The port number of localhost can be changed by passing the bash environment variable LOCAL_API_HOST_PORT with a different value, but there has to be a OpenPedCan-api server listening on the port. The API HTTP server host can be changed to https://openpedcan-api-qa.d3b.io/__docs__/ or https://openpedcan-api-dev.d3b.io/__docs__/, by passing environment variable API_HOST=qa or API_HOST=dev respectively.

4. API system design

The OpenPedCan-api server system has the following layers:

For more details about implementations, see Test run OpenPedCan-api server locally section.

The root directory of this repository should only contain starting points of different layers and configuration files.

4.1. Data model layer

db directory contains files that implement the data model layer.

db/build_db.sh builds data model files that are used by analysis logic layer.

db/load_db.sh loads local or remote pre-built data model files to the HTTP server layer.

db/r_interfaces directory contains files that define data model interfaces for R runtime.

4.2. Analysis logic layer

src directory contains files that implement the analysis logic layer.

4.3. API layer

Discussions in PedOT meetings, Slack work space, GitHub issues, etc specify the API layer.

4.4. HTTP server layer

main.R runs the OpenPedCan-api HTTP server. The HTTP server is implemented using libuv and http-parser and called by R package plumber.

The API HTTP server handles every HTTP request sequentially with the following steps:

4.5. Testing layer

The tests directory contains all tools and code for testing the API server. tests/http_response_output_files contains the API server response plots and tables. tests/results and tests/plots contain results and plots generated during test run.

4.6. Deployment layer

Jenkinsfile and Dockerfile specify the procedures to deploy the OpenPedCan-api server.

5. API Development roadmap