eastgenomics / eggd_dias_batch

1 stars 2 forks source link

eggd_dias_batch (DNAnexus Platform App)

pytest

What are typical use cases for this app?

DNAnexus app for launching CNV calling, one or more of SNV, CNV and mosaic reports workflows as well as eggd_artemis from a given directory of Dias single output and a manifest file(s).


What inputs are required for this app to run?

Required

Useful ones

Strings

Files

Booleans

Running modes

n.b. the default is for all running modes to be false, therefore if none are specified the app will raise and error and exit

Testing


How does this app work?

The app takes as a minimum input a path to Dias single output, an assay config, and at least one of the above listed running modes. The default behaviour is to pass an assay string specified to run for (with -iassay), which will search DNAnexus for the highest version config file in -iassay_config_dir (default: 001_Reference:/dynamic_files/dias_batch_configs/) and use this for analysis. Alternatively, an assay config file may be specified to use instead with -iassay_config_file. If running a reports workflow a manifest file must also be specified.

Before any jobs are launched, a check of the archival state of all required files is first made. This will use the file pattern mappings either defined in utils.defaults or from the assay config file (if specified) to search for the per sample and per run files required, any will raise an error on any archived files if unarchive=True is not set.

The general behaviour of each mode is as follows:

CNV calling

Minimum inputs:

Behaviour:

Reports workflows

Minimum inputs:

Behaviour:

n.b.

Artemis

Minimum inputs

Behaviour


Example commands

Running CNV calling and CNV reports for CEN assay:

dx run app-eggd_dias_batch \
    -iassay=CEN \
    -imanifest_files=file-xxx \
    -isingle_output_dir=project-xxx:/path_to_output/ \
    -icnv_call=true \
    -icnv_reports=true

Running reports for CNV and SNV (using previous CNV calling output) and launching eggd_artemis:

dx run app-eggd_dias_batch \
    -iassay=CEN \
    -imanifest_files=file-xxx \
    -isingle_output_dir=project-xxx:/path_to_output/ \
    -icnv_call_job_id=job-xxx \
    -icnv_reports=true \
    -isnv_reports=true \
    -iartemis=true \
    -iqc_file=file-xxx

Running SNV reports with specified config file:

dx run app-eggd_dias_batch \
    -iassay_config_file=file-xxx \
    -imanifest_files=file-xxx \
    -isingle_output_dir=project-xxx:/path_to_output/ \
    -isnv_reports=true

Running all modes in testing:

dx run app-eggd_dias_batch \
    -iassay=CEN \
    -imanifest_files=file-xxx \
    -isingle_output_dir=project-xxx:/path_to_output/ \
    -icnv_call=true \
    -icnv_reports=true \
    -isnv_reports=true \
    -imosaic_reports=true

Running CNV calling, CNV reports, SNV reports and Artemis with 2 manifest files:

dx run app-eggd_dias_batch \
    -iassay=CEN \
    -isingle_output_dir=project-xxx:/path_to_output/ \
    -imanifest_files=file-xxx \
    -imanifest_files=file-yyy \
    -iqc_file=file-zzz \
    -icnv_call=true \
    -icnv_reports=true \
    -isnv_reports=true \
    -iartemis=true

Config file design

The config file for an assay is written in JSON format and specifies the majority of inputs for running each type of analysis. A populated example config file may be found here.

The top level section should be structured as follows:

{
    "assay": "CEN",
    "version": "2.2.0",
    "cnv_call_app_id": "app-GJZVB2840KK0kxX998QjgXF0",
    "snv_report_workflow_id": "workflow-GXzkfYj4QPQp9z4Jz4BF09y6",
    "cnv_report_workflow_id": "workflow-GXzvJq84XZB1fJk9fBfG88XJ",
    "reference_files": {
        "genepanels": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-GVx0vkQ433Gvq63k1Kj4Y562",
        "exons_nirvana": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-GF611Z8433Gk7gZ47gypK7ZZ",
        "genes2transcripts": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-GV4P970433Gj6812zGVBZvB4",
        "exonsfile": "project-Fkb6Gkj433GVVvj73J7x8KbV:file-GF611Z8433Gf99pBPbJkV7bq"
    },
    "name_patterns": {
        "Epic": "^[\\d\\w]+-[\\d\\w]+",
        "Gemini": "^X[\\d]+"
    },
    ...

The definitions of inputs for CNV calling and each reports workflow should be defined under the key modes, containing a mapping of all inputs and other inputs for controlling running of analyses.

Example format of CNV call app structure:

"modes": {
    "cnv_call": {
        "instance_type": "mem2_ssd1_v2_x8",
        "inputs": {
            "bambais": {
                "folder": "/sentieon-dnaseq-4.2.1/",
                "name": ".bam$|.bam.bai$"
            },
            "GATK_docker": {
                "$dnanexus_link": {
                    "$dnanexus_link": {
                        "project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
                        "id": "file-GBBP9JQ433GxV97xBpQkzYZx"
                    }
                }
            },
            "annotation_tsv": {
                ...

Example format of a reports workflow structure:

"cnv_reports": {
        "stage_instance_types": {
            "stage-cnv_vep.vcf": "mem2_ssd2_v2_x72"
        },
        "inputs": {
            "stage-cnv_generate_bed_vep.exons_nirvana": "INPUT-exons_nirvana",
            "stage-cnv_generate_bed_vep.nirvana_genes2transcripts": "INPUT-genes2transcripts",
            "stage-cnv_generate_bed_vep.gene_panels": "INPUT-genepanels",
            "stage-cnv_generate_bed_vep.flank": 495,
            "stage-cnv_generate_bed_vep.additional_regions": {
                "$dnanexus_link": {
                    "project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
                    "id": "file-GJZQvg0433GkyFZg13K6VV6p"
                }
            },
            "stage-cnv_vep.config_file": {
                "$dnanexus_link": {
                    "project": "project-Fkb6Gkj433GVVvj73J7x8KbV",
                    "id": "file-GQGJ3Z84xyx0jp1q65K1Q1jY"
                }
            },
            "stage-cnv_vep.vcf": {
                "folder": "CNV_vcfs",
                "name": "_segments.vcf$"
            },

What does this app output