diskin-lab-chop / AutoGVP

19 stars 3 forks source link

Select AutoPVS1 transcript as final variant annotation #145

Closed rjcorb closed 1 year ago

rjcorb commented 1 year ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

Closes #144. This PR modifies autogvp and annotation filtering scripts to retain autopvs1 transcript annotation as final outputted annotation for each variant.

What was your approach?

What GitHub issue does your pull request address?

144

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Please run shell script on both pbta and custom test files

bash run_autogvp.sh --workflow="cavatica" \
--vcf=input/test_pbta.single.vqsr.filtered.vep_105.vcf \
--filter_criteria='INFO/AF>=0.2 INFO/DP>=15 (gnomad_3_1_1_AF_non_cancer<0.01|gnomad_3_1_1_AF_non_cancer=".")' \
--intervar=input/test_pbta.hg38_multianno.txt.intervar \
--multianno=input/test_pbta.hg38_multianno.txt \
--autopvs1=input/test_pbta.autopvs1.tsv \
--outdir=../results \
--out="test_pbta"
bash run_autogvp.sh --workflow="custom" \
--vcf=input/test_VEP.vcf \
--clinvar=input/clinvar.vcf.gz \
--intervar=input/test_VEP.hg38_multianno.txt.intervar \
--multianno=input/test_VEP.vcf.hg38_multianno.txt \
--autopvs1=input/test_autopvs1.txt \
--outdir=../results \
--out="test_custom"

Is there anything that you want to discuss further?

There are rare instances (I believe only in the custom test files) in which variants are annotated as intergenic by VEP, but have transcript annotation by AutoPVS1. This results in NA annotation columns for these variants in the final output, since the AutoPVS1 transcript is not found in the VEP vcf file. I will plan to run this on larger data sets to determine if this only happens with intergenic variants, in which case we can annotate them as such in the final output.

Documentation Checklist

rjcorb commented 1 year ago

hmm good point...I suppose we could add instructions for modifying output_colnames, it's pretty comprehensive now but there may be other annotations we're not capturing here