This work has now been published: AutoGVP: a dockerized workflow integrating ClinVar and InterVar germline sequence variant classification.
Kim J^, Naqvi AS^, Corbett RJ, Kaufman RS, Vaksman Z, Brown MA, Miller DP, Phul S, Geng Z, Storm PB, Resnick AC, Stewart DR, Rokita JL+, Diskin SJ+. AutoGVP: a dockerized workflow integrating ClinVar and InterVar germline sequence variant classification. Bioinformatics. 2024 Mar 4;40(3):btae114. doi: 10.1093/bioinformatics/btae114. PMID: 38426335; PMCID: PMC10955249.
^Equal first authorship +Equal senior authorship
For more detailed instructions, please visit the user guide on our wiki.
git clone git@github.com:diskin-lab-chop/AutoGVP.git
docker pull pgc-images.sbgenomics.com/diskin-lab/autogvp:v1.0.1
AutoGVP
root directory
cd AutoGVP
docker run --platform linux/amd64 --name <CONTAINER_NAME> -d -v $PWD:/home/rstudio/AutoGVP pgc-images.sbgenomics.com/diskin-lab/autogvp:v1.0.1
docker exec -ti <CONTAINER_NAME> bash
cd /home/rstudio/AutoGVP
VEP (v104)
InterVar
ANNOVAR
AutoPVS1 (v2.0)
bcftools (v1.17)
AutoGVP Requirements (recommended to place all in the data/
folder):
*VEP.vcf
) or VEP- and ClinVar-annotated VCF file (CAVATICA workflow only). For CAVATICA workflow, AutoGVP will use ClinVar annotation from sample VCF when external ClinVar VCF file is not provided. *hg38_multianno.txt
)*intervar.hg38_multianno.txt.intervar
)*autopvs1.txt
)ClinVar-selected-submissions.tsv
generated by select-clinVar-submissions.R
)clinvar_yyyymmdd.vcf.gz
optional user input or clinvar.vcf.gz
will be downloaded with download_db_files.sh
). This is an optional input for CAVATICA workflow; if not provided, AutoGVP will expect ClinVar annotation in VEP-annotated sample VCF (see above). bash scripts/download_db_files.sh
select-clinVar-submissions.R
. To customize conflicting interpretation resolution, users can provide a ClinGen Concept ID list to filter submissions against (--conceptID_list
). When a list is provided, users can also determine how unsettled conflicts are resolved with the --conflict_res
argument ("latest"
or "most_severe"
).
For more details, see the FAQ.
Example command:
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir results --conceptID_list data/clinvar_cpg_concept_ids.txt --conflict_res "latest"
bash run_autogvp.sh --workflow="custom" \
--vcf=data/test_VEP.vcf \
--filter_criteria=<filter criteria>
--clinvar=data/clinvar.vcf.gz \
--intervar=data/test_VEP.hg38_multianno.txt.intervar \
--multianno=data/test_VEP.vcf.hg38_multianno.txt \
--autopvs1=data/test_autopvs1.txt \
--outdir=results \
--out="test_custom" \
--selected_clinvar_submissions=results/ClinVar-selected-submissions.tsv \
--variant_summary=data/variant_summary.txt.gz \
--submission_summary=data/submission_summary.txt.gz \
--conceptIDs=data/clinvar_cpg_concept_ids.txt \
--conflict_res="latest"
bash scripts/download_db_files.sh
select-clinVar-submissions.R
(See custom workflow step 2 for optional conflict resolution parameters).
For more details, see the FAQ.
Example command:
Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir results --conceptID_list data/clinvar_cpg_concept_ids.txt --conflict_res "latest"
bash run_autogvp.sh --workflow="cavatica" \
--vcf=data/test_pbta.single.vqsr.filtered.vep_105.vcf \
--filter_criteria=<filter criteria> \
--intervar=data/test_pbta.hg38_multianno.txt.intervar \
--multianno=data/test_pbta.hg38_multianno.txt \
--autopvs1=data/test_pbta.autopvs1.tsv \
--outdir=results \
--out="test_pbta" \
--selected_clinvar_submissions=results/ClinVar-selected-submissions.tsv \
--variant_summary=data/variant_summary.txt.gz \
--submission_summary=data/submission_summary.txt.gz \
--conceptIDs=data/clinvar_cpg_concept_ids.txt \
--conflict_res="latest"
AutoGVP produces an abridged output file with minimal information needed to interpret variant pathogenicity, as well as a full output with >100 variant annotation columns.
chr | start | ref | alt | rs_id | gene_symbol_vep | variant_classification_vep | HGVSg | HGVSc | HGVSp | autogvp_call | autogvp_call_reason | clinvar_stars | clinvar_clinsig | intervar_evidence |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
chr1 | 1332490 | C | T | rs201607183 | TAS1R3 | missense_variant | chr1:g.1332490C>T | c.959C>T | p.Thr320Met | Uncertain_significance | ClinVar | 1 | Uncertain_significance | InterVar: Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[1, 0, 0, 0, 0, 0, 0] PP=[0, 0, 1, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0] |
chr1 | 1390349 | C | T | rs769726291 | CCNL2 | missense_variant | chr1:g.1390349C>T | c.887G>A | p.Gly296Asp | Uncertain_significance | InterVar | NA | NA | InterVar: Uncertain significance PVS1=0 PS=[0, 0, 0, 0, 0] PM=[1, 1, 0, 0, 0, 0, 0] PP=[0, 0, 0, 0, 0, 0] BA1=0 BS=[0, 0, 0, 0, 0] BP=[0, 0, 0, 0, 0, 0, 0, 0] |
*NOTE: gnomAD v.3.1.1 non-cancer AF popmax values (gnomad_3_1_1_AF_non_cancer
) will also be included in abridged output when provided.
See here for list of columns included in full output.
Ammar S. Naqvi (@naqvia) and Ryan J. Corbett (@rjcorb)
For questions, please submit an issue or send an email to Ryan Corbett (@rjcorb): corbettr@chop.edu