diskin-lab-chop / AutoGVP

17 stars 3 forks source link

AutoGVP: Automated Germline Variant Pathogenicity

DOI

This work has now been published: AutoGVP: a dockerized workflow integrating ClinVar and InterVar germline sequence variant classification.

Kim J^, Naqvi AS^, Corbett RJ, Kaufman RS, Vaksman Z, Brown MA, Miller DP, Phul S, Geng Z, Storm PB, Resnick AC, Stewart DR, Rokita JL+, Diskin SJ+. AutoGVP: a dockerized workflow integrating ClinVar and InterVar germline sequence variant classification. Bioinformatics. 2024 Mar 4;40(3):btae114. doi: 10.1093/bioinformatics/btae114. PMID: 38426335; PMCID: PMC10955249.

^Equal first authorship +Equal senior authorship

AutoGVP Workflow

For more detailed instructions, please visit the user guide on our wiki.

Clone the AutoGVP repository

git clone git@github.com:diskin-lab-chop/AutoGVP.git

Docker set-up

  1. Pull the docker image.
    docker pull pgc-images.sbgenomics.com/diskin-lab/autogvp:v1.0.1
  2. Navigate to the AutoGVP root directory
    cd AutoGVP
  3. Start a docker image. Replace with any name and run the commands below:
    docker run --platform linux/amd64 --name <CONTAINER_NAME> -d -v $PWD:/home/rstudio/AutoGVP pgc-images.sbgenomics.com/diskin-lab/autogvp:v1.0.1
    docker exec -ti <CONTAINER_NAME> bash
  4. Navigate to AutoGVP directory within the docker image
    cd /home/rstudio/AutoGVP
  5. Run AutoGVP (see example commands below).

Dependencies

VEP (v104)
InterVar
ANNOVAR
AutoPVS1 (v2.0)
bcftools (v1.17)

How to Run AutoGVP

AutoGVP Requirements (recommended to place all in the data/ folder):

Custom workflow example run

  1. Prepare input files by running VEP, ANNOVAR, InterVar, and AutoPVS1.
  2. Download database files:
    bash scripts/download_db_files.sh
  3. Run select-clinVar-submissions.R. To customize conflicting interpretation resolution, users can provide a ClinGen Concept ID list to filter submissions against (--conceptID_list). When a list is provided, users can also determine how unsettled conflicts are resolved with the --conflict_res argument ("latest" or "most_severe"). For more details, see the FAQ. Example command:
    Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir results --conceptID_list data/clinvar_cpg_concept_ids.txt --conflict_res "latest"
  4. Run AutoGVP; if output of scripts/select-clinVar-submissions.R is not provided, the script will be run prior to starting pathogenicity assessment
    bash run_autogvp.sh --workflow="custom" \
    --vcf=data/test_VEP.vcf \
    --filter_criteria=<filter criteria>
    --clinvar=data/clinvar.vcf.gz \
    --intervar=data/test_VEP.hg38_multianno.txt.intervar \
    --multianno=data/test_VEP.vcf.hg38_multianno.txt \
    --autopvs1=data/test_autopvs1.txt \
    --outdir=results \
    --out="test_custom" \
    --selected_clinvar_submissions=results/ClinVar-selected-submissions.tsv \
    --variant_summary=data/variant_summary.txt.gz \
    --submission_summary=data/submission_summary.txt.gz \
    --conceptIDs=data/clinvar_cpg_concept_ids.txt \
    --conflict_res="latest"

CAVATICA workflow example run

  1. Download database files:
    bash scripts/download_db_files.sh
  2. Run select-clinVar-submissions.R (See custom workflow step 2 for optional conflict resolution parameters). For more details, see the FAQ. Example command:
    Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir results --conceptID_list data/clinvar_cpg_concept_ids.txt --conflict_res "latest"
  3. Run AutoGVP; if output of scripts/select-clinVar-submissions.R is not provided, the script will be run prior to starting pathogenicity assessment
bash run_autogvp.sh --workflow="cavatica" \
--vcf=data/test_pbta.single.vqsr.filtered.vep_105.vcf \
--filter_criteria=<filter criteria> \
--intervar=data/test_pbta.hg38_multianno.txt.intervar \
--multianno=data/test_pbta.hg38_multianno.txt \
--autopvs1=data/test_pbta.autopvs1.tsv \
--outdir=results \
--out="test_pbta" \
--selected_clinvar_submissions=results/ClinVar-selected-submissions.tsv \
--variant_summary=data/variant_summary.txt.gz \
--submission_summary=data/submission_summary.txt.gz \
--conceptIDs=data/clinvar_cpg_concept_ids.txt \
--conflict_res="latest"

AutoGVP Output

AutoGVP produces an abridged output file with minimal information needed to interpret variant pathogenicity, as well as a full output with >100 variant annotation columns.

Abridged output example:

chr start ref alt rs_id gene_symbol_vep variant_classification_vep HGVSg HGVSc HGVSp autogvp_call autogvp_call_reason clinvar_stars clinvar_clinsig intervar_evidence
chr1 1332490 C T rs201607183 TAS1R3 missense_variant chr1:g.1332490C>T c.959C>T p.Thr320Met Uncertain_significance ClinVar 1 Uncertain_significance InterVar: Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[1, 0, 0, 0, 0, 0, 0] PP=[0, 0, 1, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0]
chr1 1390349 C T rs769726291 CCNL2 missense_variant chr1:g.1390349C>T c.887G>A p.Gly296Asp Uncertain_significance InterVar NA NA InterVar: Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[1, 1, 0, 0, 0, 0, 0] PP=[0, 0, 0, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0]

*NOTE: gnomAD v.3.1.1 non-cancer AF popmax values (gnomad_3_1_1_AF_non_cancer) will also be included in abridged output when provided.

See here for list of columns included in full output.

Code Authors

Ammar S. Naqvi (@naqvia) and Ryan J. Corbett (@rjcorb)

Contact

For questions, please submit an issue or send an email to Ryan Corbett (@rjcorb): corbettr@chop.edu