diskin-lab-chop / AutoGVP

19 stars 3 forks source link

update clinvar star assignment code #134

Closed rjcorb closed 1 year ago

rjcorb commented 1 year ago

Purpose/implementation Section

What feature is being added or bug is being addressed?

Closes #132 and #133. This PR modifies assignment of ClinVar stars to variants in vcf file by using str_detect, and modifies logic so variants that are not in ClinVar database are clinvar_stars == 0.

What was your approach?

See above.

What GitHub issue does your pull request address?

132 and #133

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Please review updated code logic, and run test_pbta and custom test scripts through autogvp shell script. Check that all variants with is.na(clivnar_clinsig) also have clinvar_stars set to NA.

bash run_autogvp.sh --workflow="cavatica" \
--vcf=input/test_pbta.single.vqsr.filtered.vep_105.vcf \
--filter_criteria='INFO/AF>=0.2 INFO/DP>=15 (gnomad_3_1_1_AF_non_cancer<0.01|gnomad_3_1_1_AF_non_cancer=".")' \
--intervar=input/test_pbta.hg38_multianno.txt.intervar \
--multianno=input/test_pbta.hg38_multianno.txt \
--autopvs1=input/test_pbta.autopvs1.tsv \
--outdir=../results \
--out="test_pbta"
bash run_autogvp.sh --workflow="custom" \
--vcf=input/test_VEP.vcf \
--clinvar=input/clinvar.vcf.gz \
--intervar=input/test_VEP.hg38_multianno.txt.intervar \
--multianno=input/test_VEP.vcf.hg38_multianno.txt \
--autopvs1=input/test_autopvs1.txt \
--outdir=../results \
--out="test_custom"

Is there anything that you want to discuss further?

No

Documentation Checklist

rjcorb commented 1 year ago

So with these changes, those would be annotated with 0 stars, but if that needs to be changed we can discuss further.

jungkim2 commented 1 year ago

It seems like ClinVar is recognizing CLNREVSTAT=no_interpretation_for_the_single_variant as no star

It seems like what this does is, it requires two different variants to be classified as P/LP/VUS/LB/B

ex: https://www.ncbi.nlm.nih.gov/clinvar/variation/487089/?oq=487089&m=NM_007262.5(PARK7):c.487G%3EA%20(p.Glu163Lys) https://www.ncbi.nlm.nih.gov/clinvar/variation/982549/?oq=982549&m=NM_022787.4(NMNAT1):c.275G%3EA%20(p.Trp92Ter)

naqvia commented 1 year ago

Thanks for the clarification. I don't see any other options, so I think we have everything covered...

(base) naqvia@DCR6034KVC AutoGVP % gzcat input/clinvar.vcf.gz| awk -F "\t" '{print $NF}' | perl -pe 'if( $_=~/(CLNREVSTAT\=[\w+\,\_]+)\;/){ print $1,"\t"}' | awk '{print $1}'  | grep ^CLN  | sort | uniq -c
101811 CLNREVSTAT=criteria_provided,_conflicting_interpretations
324716 CLNREVSTAT=criteria_provided,_multiple_submitters,_no_conflicts
1743208 CLNREVSTAT=criteria_provided,_single_submitter
51418 CLNREVSTAT=no_assertion_criteria_provided
10409 CLNREVSTAT=no_assertion_provided
 678 CLNREVSTAT=no_interpretation_for_the_single_variant
  51 CLNREVSTAT=practice_guideline
14585 CLNREVSTAT=reviewed_by_expert_panel

So, we should modify code to add this as 0 Stars. cc @jharenza

jungkim2 commented 1 year ago

Also, just realized that it is in their manual. https://www.ncbi.nlm.nih.gov/clinvar/docs/review_status/

I might have missed it because it was not seen before :( But see above that it is indeed considered 0 star.

rjcorb commented 1 year ago

Good catch! I will submit a ticket and modify accordingly