Ultimagen / healthomics-workflows

UltimaGenomics repository for workflows compatible with AWS HealthOmics
10 stars 2 forks source link

healthomics-workflows

UltimaGenomics repository for workflows compatible with AWS HealthOmics

Table of Contents

  1. Introduction

  2. Deploying Private Workflow

  3. Running Private Workflow

  4. Support Tools

    Introduction

  5. Ultima Genomics offers pipelines as Ready2Run workflows on AWS HealthOmics. Ready2Run workflows enable you to run these pipelines on AWS HealthOmics by simply bringing your data. For more flexibility such as the use of larger file sizes or changing the reference genome, you can convert Ready2Run workflows to private workflows by following the steps in this repository. Once the Ready2Run workflow is converted to a private workflow, the cost to run the workflow will now be based on the compute and run storage used during the private workflow.

  6. Ultima Genomics also shares pipelines that has been modified to run as private workflows on AWS HealthOmics in this repository. You can follow the directions in this repository to create and run a private workflow on AWS HealthOmics.

  7. Each workflow folder contains the following:

    • required wdl file\s
    • HowTo documentation that details the workflow flow and how to run it externally of wdl
    • documentation of the wdl inputs and outputs
    • json that list the parameters for creating workflow
    • folder with optional input templates with default parameters for the wdl
    • folder with the different tasks the wdl is running

    The instructions below include localizing resources, deploying workflow and creating a run.

  8. For more questions about these workflows, please contact healtomics.support@ultimagen.com.

Deploying Private Workflow

To localize workflow resources and create a private workflow in AWS HealthOmics you can:

  1. Pull and push the required public containers to your private ECR by following the steps:

    a. Pull from docker hub or broad gcr into your local ecr

     docker pull <hub_username>/<image_name>:<tag> #the docker as it appear on globals.wdl
     docker tag <hub_username>/<image_name>:<tag> <your_aws_account_id>.dkr.ecr.<region>.amazonaws.com/<repository_name>:<tag>
     aws ecr get-login-password --region <region> | docker login --username AWS --password-stdin <your_aws_account_id>.dkr.ecr.<region>.amazonaws.com
     docker push <your_aws_account_id>.dkr.ecr.<region>.amazonaws.com/<repository_name>:<tag> #if repository doesn't exist, you will need to create it first

    b. Grant AWS HealthOmics permission to access your private ECR by following the instructions here.

  2. Import your input files into a S3 bucket.

  3. Create an OmicsService role to access your resources by following the instructions here.

ii. Download the workflow folder as a zipped file, this should include main wdl file on the top level folder, tasks folders and _params.json . You can save this zipped file locally or in a S3 bucket.

iii. Download locally the parameter template for your desired use case from input_templates folder.

iv. Modify and save the workflow scripts and parameter templates to meet your needs:

Once the workflow resources have been deployed into locally (see instructions per workflow), user can create private workflow on AWS HealthOmics

Create a private workflow in HealthOmics by following one of the two options below:

i. From the CLI:

$ aws omics create-workflow \
    --name <workflow_name> \
    --main <main_wdl_file> \  # in case there is more than one wdl file, the main one is the one named after the directory
    --definition-zip fileb://<path_to_local_zip> \
    --parameter-template file://<path_to_parameters_definition_json> \
    --accelerators GPU

ii. From the console:

a. Click on **Private Workflows** from the left pane.

b. Click on **Create Workflow** on the Workflows list.

c. Follow the instructions on the console to create your workflow.
   - Define "Main workflow definition file path" as <workflow_name>.wdl file

Running Private Workflow

Run your workflow by following one of the two options below:

i. From the CLI:

$ aws omics start-run \
    --workflow-id <workflow_id> \
    --role-arn <service_role_arn> \
    --output-uri <s3_uri_for_output_folder> \
    --parameters file://<path_to_local_parameters_file> \
    --name <run_name> \
    --retention-mode REMOVE

ii. From the console (current omics versoin doesn't work well with wdl scoped parameters, cli is preferred):

a. Click Private Workflows from the left pane.

b. Click the Workflow ID from the Workflows list.

c. Click Create Run and enter the run information.

Support Tools

In case your private workflow's run failed, you can use this script to extract information and logs from AWS HealthOmics run to ease failures debugging. Please attach the tar file that generated by the script in any support call.