Production execution mode

Summary

Semi-automated production runtime for accreditation; overview for aims and progress to complete the production feature on feat/production.

Aims

Stack deployment

[x] deployment separation for dev and production
[x] multiple stack deployment configurations
[ ] default configs (http-local, http-local-secure, https-web, https-web-secure)
[ ] test multi-stack deployment with https-web and https-web-secure configs
[ ] interactive setting of passwords and modification of default configs
[ ] enable memory overcommit for redis services, refer to documentation

File input and storage

[ ] sentinel run and read file detection
[ ] client library for registration with SeaweedFS and API
[ ] application interface to select registered files and launch workflows
[ ] pre-workflow sample checks and file pulls
[ ] workflow launcher system - possible independent distributed server?

Production pipeline

[x] wet-lab sample-sheet creation - todo: change to tab-delimited
[x] input watcher and validation - test on network drives
[x] input sample sheet and file validation
[ ] post-workflow sample checks and registration
[ ] post-workflow data compilation and upload
[x] notifications on slack and/or email
[ ] notifications and resource provision in application
[ ] input/output backup

User improvements

[x] report amendments #6
[x] page navigation progress bar #21
[ ] remove libssl-dev system dependency from stack deployment if possible

Testing modules

[ ] system integration test after production setup
[ ] syndrome specific integration test for results validation
[ ] smoke test for fast runtime validation

Documentation

[x] development practices [developer]
[ ] production workflow setup [bioinformatician]
[ ] data input operating procedure [wet-lab]
[ ] workflow error recovery procedure [wet-lab, bioinformatician]
[ ] workflow error patch procedure [bioinformatician]

Steps

Full-stack setup of the Cerebro production environment for continuous operations. For setting up parallel test or development environments, see details in the documentation.

Requirements

Linux system
Mamba installation
Cerebro client installation
Cerebro stack setup and operation

Stack setup and verification

Production setup and verification

Production directory and sub-directories are setup on the system - you can read more about the types of production environments that are currently supported in the documentation.

Here we setup the RUNTIME directory where all workflows are executed, and the INPUT directory where wet-lab staff or laboratory data transfer can deposit the reads and sample sheet to trigger a worfklow execution.

# Local paths for runtime and data input
export CEREBRO_BASE_PROD=/data/cerebro/prod
export CEREBRO_INPUT_PROD=/samba/project/cerebro/prod

# Setup the runtime directory where workflows are executed
cerebro production setup-base --directory $CEREBRO_RUN_PROD

# Setup the input directory with a specific team and database upload configuration
cerebro production setup-input --directory $CEREBRO_INPUT_PROD --configuration production --team-name VIDRL --database-name "META-GP Production"

Multiple runtime and input folders can be setup for testing, development or validation configurations. Workflow execution and outputs are configured with specific production variables that ensure

Workflow setup and testing

Workflow is setup for production and integration tests are run for production.

# Check workflow help menu as sanity check
nextflow run esteinig/cerebro -r 1.0.0-nata.1 --help

# Provision the accreditation database with Cipher
nextflow run esteinig/cerebro -r 1.0.0-nata.1 -profile mamba -entry cipher --revision 1.0.0-nata.1 --outdir cipher/

# Obtain the access token for the API
export CEREBRO_API_URL="http://api.cerebro.localhost"
export CEREBRO_API_TOKEN=$(cerebro api login -u $CEREBRO_USERNAME -p $CEREBRO_PASSWORD)

# Run workflow integration tests for setup and central nervous system infections
nextflow run esteinig/cerebro -r 1.0.0-nata.1 -profile mamba,ciqa-setup@v1,ciqa-cns@v1

Sample sheet for wet-lab

Current sample sheet is focused on dry-lab operation. We need a user-safe sample sheet template that registers the library identifiers, minimal sample meta-data, wet-lab comments, aneuploidy consent and links to the files in the same input directory

Initial template: https://github.com/esteinig/cerebro/blob/feat/production/templates/production/SampleSheet.xlsx

Automated watcher and input checks

Sample sheet and fastq files (demultiplexed, de-umified) are watched and validated in the input folder. Depending on the input configuration file the watcher will run production stream and upload to the specified team-database-collection at conclusion of run - different input configuration files (folders) can be watched by different production, test, validation... watchers and outputs deposited into the appropriate database section. Triggers run of the Nextflow pipeline and notifications to Slack.

When the pipeline starts, sample identifiers are checked against the team-database-collection to ensure they are unique - the run is registered with the database and samples await confirmation of completion. If sample identifier exists in database the run fails.

Post-workflow sample checks

When the pipeline completes, sample identifiers are collected and validated against registered sample identifiers for this run. Each module (quality control, classification) is checked for completion in each sample. If a sample for some reason did not complete the module, it is marked in the database.

Post-workflow data compilation and upload

After completion, outputs are aggregated into the database models and uploaded into the specified collection via the API

Progress

[x] Slack notifications - construct and send markdown messages
[x] Sample sheet production template for consultation with wet-lab
[x] Basic event polling and sub-polling of input folders
[x] Basic input checks and validation with Slack notifications

esteinig / cerebro