diskin-lab-chop / AutoGVP

17 stars 3 forks source link

Add ClinVar origin columns to AutoGVP output #219

Closed rjcorb closed 8 months ago

rjcorb commented 8 months ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

Closes #217. This PR updates AutoGVP scripts to include ClinVar Origin and OriginSimple columns to final autogvp output.

What was your approach?

What GitHub issue does your pull request address?

217

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Please re-run select ClinVar submissions script from AutoGVP root directory:

Rscript scripts/select-clinVar-submissions.R --variant_summary data/variant_summary.txt.gz --submission_summary data/submission_summary.txt.gz --outdir results --conceptID_list data/clinvar_all_disease_concept_ids.txt --conflict_res "latest"

The ClinVar-selected-submissions.tsv file should contain columns Origin and OriginSimple.

Run test pbta sample through updated AutoGVP code to ensure origin columns are included in full output:

bash run_autogvp.sh --workflow="cavatica" \
--vcf=data/test_pbta.single.vqsr.filtered.vep_105.vcf \
--filter_criteria='FORMAT/DP>=10 (FORMAT/AD[0:1-])/(FORMAT/DP)>=0.2 (gnomad_3_1_1_AF_non_cancer<0.001|gnomad_3_1_1_AF_non_cancer=".")' \
--intervar=data/test_pbta.hg38_multianno.txt.intervar \
--multianno=data/test_pbta.hg38_multianno.txt \
--autopvs1=data/test_pbta.autopvs1.tsv \
--selected_clinvar_submissions=results/ClinVar-selected-submissions.tsv \
--outdir=results \
--out="test_pbta_origin"

Is there anything that you want to discuss further?

As of Jan 29 2024, ClinVar is incorporating classification of somatic variants into its database. When new ClinVar variant and submission summary files are available, we will have to assess whether this will require further changes to AutoGVP.

Documentation Checklist