ding-lab / CharGer

Characterization of Germline variants
https://ding-lab.github.io/CharGer/
GNU General Public License v3.0
97 stars 37 forks source link

Null Variant classified as Benign #45

Closed NTNguyen13 closed 2 years ago

NTNguyen13 commented 4 years ago

Hi, I have succesfully run CharGer using the following command:

charger \
    -f ~/vcf/17D2625146_FB_hg19.vcf \
    -o test_charger_everything.tsv \
    -l -D \
    --mac-clinvar-tsv ~/clinvar_5b04ade/output/b37/single/clinvar_alleles.single.b37.tsv.gz \
    -z ~/charger_db/grch37_pathogenic_variants.vcf \
    --inheritanceGeneList ~/charger_db/inheritance_gene_table.tsv \
    --PP2 ~/charger_db/pp2_gene_list.txt

The database files I got from this: https://github.com/ding-lab/CharGer/tree/master/tests/examples/annotations My VCF is annotated by VEP --everything flag. In this sample, I have a particular variant: ENST00000267163.4:c.1954_1960+2del, that is a null variant, and not in GnomeAD database. However, CharGer report this variant has Allele frequency of 0.5, and put it in BA1 evidence, hence classify it at Benign.

This is the terminal output from CharGer run:

(skipping a lot of warning rows)
Running ClinVar took 30.4019100666seconds
Running exac took 2.28881835938e-05seconds
CharGer module PVS1
- truncations in genes where LOF is a known mechanism of the disease
- require the mode of inheritance to be dominant (assuming heterzygosity) and co-occurence with reduced gene expression
- run concurrently with PSC1, PMC1, PM4, PPC1, and PPC2 -
CharGer module PS1
- same peptide change as a previously established pathogenic variant
PS1 found 0 pathogenic variants
CharGer module PS2
- de novo with maternity and paternity confirmation and no family history
CharGer module PS3: Well-established in vitro or in vivo functional studies             supportive of a damaging effect on the gene or gene product
CharGer module PS4: not yet implemented
CharGer module PM1:  Located in a mutational hot spot and/or critical and well-established              functional domain (e.g., active site of an enzyme) without benign variation
CharGer::PM1 Warning: clustersFile is not supplied. PM1 was not executed.
CharGer module PM2
- absent or extremely low frequency in controls
CharGer module PM3: not yet implemented
CharGer module PM4
- protein length changes due to inframe indels or nonstop variant of selected genes -
CharGer module PM5
- different peptide change of a pathogenic variant at the same reference peptide
PM5 found 0 pathogenic variants
CharGer module PM6
- assumed de novo without maternity and paternity confirmation
CharGer module PP1
- cosegregation with disease in family members in a known disease gene
CharGer module PP2: Missense variant in a gene that has low rate of benign missense and in which missense are common mechanism of disease
CharGer module PP3
- multiple lines of in silico evidence of deliterous effect
Found 0 variants with >= 2 of in silico evidence
CharGer module PP4: not yet implemented
CharGer module PP5: not yet implemented
CharGer module BA1
- allele frequency >5%
CharGer module BS1: not yet implemented
CharGer module BS2: not yet implemented
CharGer module BS3: not yet implemented
CharGer module BS4: not yet implemented
CharGer module BP1: Missense variant in a gene for which primarily truncations cause disease
CharGer::BP1 Error: Cannot evaluate BP1: No BP1 gene list supplied.
CharGer module BP2: not yet implemented
CharGer module BP3: not yet implemented
CharGer module BP4
 - in silico evidence of no damage
Found 0 variants with >= 2 with in silico evidence
CharGer module BP5: not yet implemented
CharGer module BP6: not yet implemented
CharGer module BP7: not yet implemented
CharGer module PSC1
Recessive truncations of susceptible genes
CharGer module PMC1
Truncations of genes when no gene list provided
CharGer module PPC1
- protein length changes due to inframe indels or nonstop variant of other, not-specificied genes -
CharGer module PPC2
- protein length changes due to inframe indels or nonstop variant when no susceptibility genes given -
CharGer module BSC1
- same peptide change as a previously established benign variant
BSC1 found 0 benign variants
CharGer module BMC1
- different peptide change of a benign variant at the same reference peptide
BMC1 found 0 benign variants
0.0005 < 0.05
write 27 charged user variants to test_charger_everything.tsv
charger::writeSummary Warning: skipping pubmed link tests

CharGer run Times:
input parse time (s): 0.000585079193115
get input data time (s): 2.63706803322
get external data time (s): 30.4032850266
modules run time (s): 0.0069580078125
classification time (s): 0.00322985649109
CharGer full run time (s): 33.0511260033

What have gone wrong in my case? How can I improve the result? Thank you very much

NTNguyen13 commented 4 years ago

For more information: this variant falls in RB1 gene, which is presence in PP2 gene list and inheritance table with autosomal dominant mode

ccwang002 commented 4 years ago

Thanks for reporting this to us. This is a known issue we want to address in the new release. CharGer v0.5 will sometimes use the wrong AF as the population allele frequency if it cannot find gnomAD_AF, ExAC_MAF, and ExAC_AF. More specifically, it should be using the AF inside CSQ rather than the one directly in the INFO. For example, the following VCF will likely get the same problem as yours (AF being 1.00):

##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##VEP="v95" time="2019-08-20 15:52:03" cache="~/.vep/homo_sapiens/95_GRCh38" ensembl-funcgen=95.94439f4 ensembl-variation=95.858de3e ensembl=95.4f83453 ensembl-io=95.78ccac5 1000genomes="phase3" COSMIC="86" ClinVar="201810" ESP="V2-SSA137" HGMD-PUBLIC="20174" assembly="GRCh38.p12" dbSNP="151" gencode="GENCODE 29" genebuild="2014-07" gnomAD="170228" polyphen="2.2.2" regbuild="1.0" sift="sift5.2.2"
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|VARIANT_CLASS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|TSL|APPRIS|CCDS|ENSP|SWISSPROT|TREMBL|UNIPARC|GENE_PHENO|SIFT|PolyPhen|DOMAINS|miRNA|HGVS_OFFSET|AF|AFR_AF|AMR_AF|EAS_AF|EUR_AF|SAS_AF|AA_AF|EA_AF|ExAC_AF|ExAC_Adj_AF|ExAC_AFR_AF|ExAC_AMR_AF|ExAC_EAS_AF|ExAC_FIN_AF|ExAC_NFE_AF|ExAC_OTH_AF|ExAC_SAS_AF|gnomAD_AF|gnomAD_AFR_AF|gnomAD_AMR_AF|gnomAD_ASJ_AF|gnomAD_EAS_AF|gnomAD_FIN_AF|gnomAD_NFE_AF|gnomAD_OTH_AF|gnomAD_SAS_AF|MAX_AF|MAX_AF_POPS|CLIN_SIG|SOMATIC|PHENO|PUBMED|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       69270   .       A       G       269.77  .       AC=2;AF=1.00;AN=2;DP=10;CSQ=G|synonymous_variant|LOW|OR4F5|ENSG00000186092|Transcript|ENST00000335137|protein_coding|1/1||ENST00000335137.4:c.180A>G|ENSP00000334393.3:p.Ser60%3D|216|180|60|S|tcA/tcG|rs201219564||1||SNV|HGNC|HGNC:14825|YES||P1|CCDS30547.1|ENSP00000334393|Q8NH21||UPI0000041BC1||||Gene3D:1.20.1070.10&Pfam_domain:PF13853&Prints_domain:PR00237&PROSITE_profiles:PS50262&hmmpanther:PTHR26451&hmmpanther:PTHR26451:SF179&Superfamily_domains:SSF81321&Transmembrane_helices:TMhelix&Conserved_Domains:cd15226&Low_complexity_(Seg):seg||||||||||||||||||||0.8327|0.3603|0.7916|0.8434|0.9983|0.877|0.9112|0.8481|0.9018|0.9983|gnomAD_EAS||||||||

The current workaround is to remove the AF INFO field before running CharGer. Using the same example the input becomes:

##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|E
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO
1       69270   .       A       G       269.77  .       AC=2;AN=2;DP=10;CSQ=...(same)...

If your VCF has the same format, please try this approach and see it works for you. By the way, if you can also share your de-identified VCF that will help us debug and build up our test case. CharGer 0.5 doesn't rely on any sample-level information so feel free to remove all the sample columns.

NTNguyen13 commented 4 years ago

Hi, thank you for clarification. Github doesn't support vcf so I changed the extension to .txt I will try to remove AF info before using CharGer, but I still wonder why the null variant is not listed in PVS1 evidence.

VEP_annotated.txt