ding-lab / CharGer

Characterization of Germline variants
https://ding-lab.github.io/CharGer/
GNU General Public License v3.0
96 stars 37 forks source link

All variants classified as Benign or Uncertain Significance #37

Closed nrosewick closed 4 years ago

nrosewick commented 4 years ago

Hello,

I tried to use CharGer on a WES vcf file processed with GATK 4.1.1.0 from hg38 bam files and annotate wih VEP 95. VCF contains 547 samples. Looking a the results all variant are classified as "Uncertain"

cat out.charger.txt | awk -F '\t' '{print $20}' | sort | uniq -c
 224571 Benign
 933207 Uncertain Significance

Looking in the log file I can see that some warnings pop out :

No gene list file uploaded. CharGer will not make PVS1 calls. No PP2 gene list file uploaded. CharGer will not make PP2 calls. No BP1 gene list file uploaded. CharGer will not make BP1 calls. No expression file uploaded. CharGer will allow all passed truncations without expression data in PVS1.

Is is expected to have either Benign or Uncertain Significance variants ? Do you have maybe an (unannotated) test VCF (in hg38) with variants to should be annotated as pathogenic in order for me to test my config.

Thanks

The log file :

charger -f input.sort.vcf.gz -o out.charger.txt --vep-cache /home/vep/.vep/ --vep-version 95 --grch 38 --reference-fasta /home/genomes/hg38/Homo_sapiens_assembly38.fasta

Using default module scores and category thresholds:
   BA1 = -8
   BMC1 = -2
   BP1 = -1
   BP2 = -1
   BP3 = -1
   BP4 = -1
   BP5 = -1
   BP6 = -1
   BP7 = -1
   BS1 = -4
   BS2 = -4
   BS3 = -4
   BS4 = -4
   BSC1 = -6
   PM1 = 2
   PM2 = 2
   PM3 = 2
   PM4 = 2
   PM5 = 2
   PM6 = 2
   PMC1 = 2
   PP1 = 1
   PP2 = 1
   PP3 = 1
   PP4 = 1
   PP5 = 1
   PPC1 = 1
   PPC2 = 1
   PS1 = 7
   PS2 = 4
   PS3 = 4
   PS4 = 4
   PSC1 = 4
   PVS1 = 8
   maxBenignScore = -8
   maxLikelyBenignScore = -4
   minLikelyPathogenicScore = 5
   minPathogenicScore = 9
Will capture vcf details for output: False
This .vcf has AF!

Skipping: 0 for filters and 0 for AF and 0 for mutation types out of 1157778
No gene list file uploaded. CharGer will not make PVS1 calls.
No PP2 gene list file uploaded. CharGer will not make PP2 calls.
No BP1 gene list file uploaded. CharGer will not make BP1 calls.
No expression file uploaded. CharGer will allow all passed truncations without expression data in PVS1.
charger::getVEP Warning: skipping VEP 
Running VEP took 7.10487365723e-05seconds
charger::getClinVar
Running ClinVar took 1.69277191162e-05seconds
Running exac took 1.59740447998e-05seconds
CharGer module PVS1
- truncations in genes where LOF is a known mechanism of the disease
- require the mode of inheritance to be dominant (assuming heterzygosity) and co-occurence with reduced gene expression
- run concurrently with PSC1, PMC1, PM4, PPC1, and PPC2 -
CharGer::runIndelModules Error: Cannot evaluate PVS1 or PM4: No gene list supplied.
CharGer module PS1
- same peptide change as a previously established pathogenic variant
PS1 found 0 pathogenic variants
CharGer module PS2
- de novo with maternity and paternity confirmation and no family history
CharGer module PS3: Well-established in vitro or in vivo functional studies             supportive of a damaging effect on the gene or gene product
CharGer module PS4: not yet implemented
CharGer module PM1:  Located in a mutational hot spot and/or critical and well-established               functional domain (e.g., active site of an enzyme) without benign variation
CharGer::PM1 Warning: clustersFile is not supplied. PM1 was not executed.
CharGer module PM2
- absent or extremely low frequency in controls
CharGer module PM3: not yet implemented
CharGer module PM4
- protein length changes due to inframe indels or nonstop variant of selected genes -
CharGer module PM5
- different peptide change of a pathogenic variant at the same reference peptide
PM5 found 0 pathogenic variants
CharGer module PM6
- assumed de novo without maternity and paternity confirmation
CharGer module PP1
- cosegregation with disease in family members in a known disease gene
CharGer module PP2: Missense variant in a gene that has low rate of benign missense and in which missense are common mechanism of disease
CharGer::PP2 Error: Cannot evaluate PP2: No PP2 gene list supplied.
CharGer module PP3
- multiple lines of in silico evidence of deliterous effect
Found 0 variants with >= 2 of in silico evidence
CharGer module PP4: not yet implemented
CharGer module PP5: not yet implemented
CharGer module BA1
- allele frequency >5%
CharGer module BS1: not yet implemented
CharGer module BS2: not yet implemented
CharGer module BS3: not yet implemented
CharGer module BS4: not yet implemented
CharGer module BP1: Missense variant in a gene for which primarily truncations cause disease
CharGer::BP1 Error: Cannot evaluate BP1: No BP1 gene list supplied.
CharGer module BP2: not yet implemented
CharGer module BP3: not yet implemented
CharGer module BP4
 - in silico evidence of no damage
Found 0 variants with >= 2 with in silico evidence
CharGer module BP5: not yet implemented
CharGer module BP6: not yet implemented
CharGer module BP7: not yet implemented
CharGer module PSC1
Recessive truncations of susceptible genes
CharGer module PMC1
Truncations of genes when no gene list provided
CharGer module PPC1
- protein length changes due to inframe indels or nonstop variant of other, not-specificied genes -
CharGer module PPC2
- protein length changes due to inframe indels or nonstop variant when no susceptibility genes given -
CharGer module BSC1
- same peptide change as a previously established benign variant
BSC1 found 0 benign variants
CharGer module BMC1
- different peptide change of a benign variant at the same reference peptide
BMC1 found 0 benign variants
0.0005 < 0.05
write 1157778 charged user variants to out.vep.charger.txt
charger::writeSummary Warning: skipping pubmed link tests

CharGer run Times:
input parse time (s): 0.000698089599609
get input data time (s): 37452.09834  
get external data time (s): 105.454962015
modules run time (s): 268.449123859   
classification time (s): 314.865689993
LayalYasin commented 4 years ago

I am facing the same issue

NagaComBio commented 4 years ago

The CharGer needs more annotations like Clinvar and known pathogenic variant list, etc. Check this comment https://github.com/ding-lab/CharGer/issues/18#issuecomment-475979810 from @fernanda-rodrigues

fernanda-rodrigues commented 4 years ago

Hello @LayalYasin and @nrosewick ,

Thank you for using our tool!! As you can see in the warnings that pop up, in order for CharGer to best classify your variants, you must give it a little more information (as mentioned in comment #18). The more information you give, the better your classifications will be.

For PVS1, for example, according to the ACMG guidelines; variants will fall in this category if they are "null variants (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multiexon deletion) in a gene where LOF is a known mechanism of disease). So if you don't specify that, CharGer won't know. If you have a list of genes specific to your disease that fall into this category, please provide that as a list. If you don't mind me asking, which disease are you dealing with here? We do have such list for cancer studies.

Also, ClinVar information is crucial. That can really boost your analysis. You can find the ClinVar file we use internally here: https://github.com/fernanda-rodrigues/ClinVar/blob/20190815_release/output/b38/single/clinvar_alleles.single.b38.tsv.gz

This file has been generated based on MacArthur lab codes and refers to ClinVar release from 08/15/2019.

The formats of the PVS1, PP2 and BP2 files are described in our README.

Please do not hesitate to ask for help. I am happy to assist you!

Fernanda

fernanda-rodrigues commented 4 years ago

I am closing this issue. Please let us know if you need further help.

Thanks!