exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
202 stars 55 forks source link

Ability to evaluate short tandem repeats in Exomiser #300

Closed williakd17 closed 5 years ago

williakd17 commented 5 years ago

Unsure where the original post went (I must have accidentally edited/removed it), but the variant in question is the following: PRRT2 | NM_001256442.1 | c.649dupC | 397.73 | 0.270479% | Autosomal Dominant This variant is in a 8 C repeat with an adjacent G. The original vcf line is as follows:

chr16 29825015 . G GC 397.73 . Gene=PRRT2;Transcript=NM_001256443;HGVS=c.649dupC;AC=1;AF=0.500;AN=2;BaseQRankSum=-0.554;DP=78;FS=10.440;MLEAC=1;MLEAF=0.500;MQ=59.97;MQ0=0;MQRankSum=0.000;QD=5.10;ReadPosRankSum=0.539;EXON=2 GT:AD:GQ:PL:SB:ZG:ZW:WN:FP 0/1:42,23:99:435,0,958:16,23,4,19:Heterozygous:L:1:0

julesjacobsen commented 5 years ago

Was it a problem with the phenotype score not being high enough? Is this still an issue?

williakd17 commented 5 years ago

Apparently it was an issue with the reference and alt in the vcf for these regions. The variant is a dupC, however the ref is a G (adjacent base to the start of the repeat) and the alt is GC. Technically it should be Ref C Alt CC. I read an entry you posted discussing how this was a Jannovar issue (https://github.com/exomiser/Exomiser/issues/207), however I am unsure if this will directly help resolve my issue. Interesting enough, if the frequencyFilter was disabled (not ideal or practical), the variant was appropriately ranked. If the ref and alt was manually changed to the technically correct ones above, the gene would also be ranked appropriately (Exomiser: 0.9822565 Variant: 1.0 Phenotype: 0.78100365 Genomic Position: g.29825024_29825025insC)

your-highness commented 5 years ago

Dear @julesjacobsen and @williakd17 ,

Sorry for hijacking the thread but my problem pertains to the same problem, I assume.

I have a very similar problem when analyzing CHEK2 variants with a local installation of exomiser-web from v11 Release and 1807_phenotype: Known causative CHEK2 variants are not shown on the results page - unless I increase MAF threshold to 100% and allow for non-pathogenic variants, like shown below. image

For example, a heterozygous frameshift variant in CHEK2 chr22:g.29091856AG>A [0/1]

22  29091856    .   AG  A   7293.73 .   AC=1;AF=0.500;AN=2;BaseQRankSum=-1.044e+00;ClippingRankSum=0.00;DP=380;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.90;MQRankSum=0.546;QD=19.19;ReadPosRankSum=-2.950e-01;SOR=0.673  GT:AD:DP:GQ:PL  0/1:169,211:380:99:7331,0,5631

is filtered out when setting MAF to 1%, although all reported frequencies are <1%:

image

I am confused that the Variant Score is 0. If I check the Variants.tsv file, it says the variant is filtered out by "inheritance" but I am not sure if this makes sense. Moreover image

Also CHEK2 has a certain number of HPO terms associated to it (similar to BRCA1 which is not filtered out):

$ rgrep -e CHEK2 -e BRCA1 1807_phenotype
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672    BRCA1   Nausea and vomiting     HP:0002017
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672    BRCA1   Constipation    HP:0002019
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672    BRCA1   Abnormality of the abdominal wall       HP:0004298
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672    BRCA1   Abdominal pain  HP:0002027
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672    BRCA1   Weight loss     HP:0001824
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672    BRCA1   Abnormality of the peritoneum   HP:0002585
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672    BRCA1   Ovarian neoplasm        HP:0100615
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Neoplasm of the skin    HP:0008069
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Abnormality of metabolism/homeostasis   HP:0001939
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Neoplasm of the nervous system  HP:0004375
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Autosomal dominant inheritance  HP:0000006
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Melanoma        HP:0002861
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Glioma  HP:0009733
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Neoplasm of the lungs   HP:0100526
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Neoplasm of the pancreas        HP:0002894
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Meningioma      HP:0002858
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Acute leukemia  HP:0002488
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Neoplasm of the breast  HP:0100013
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Breast carcinoma        HP:0003002
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Stomach cancer  HP:0012126
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Neoplasm of the adrenal cortex  HP:0100641
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Neoplasm of the skeletal system HP:0010622
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Retinoblastoma  HP:0009919
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Lymphoma        HP:0002665
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Neoplasm of the colon   HP:0100273
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Sarcoma HP:0100242
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200  CHEK2   Osteosarcoma    HP:0002669

Do you have any idea how I can trace that issue?

Thanks in advance!

williakd17 commented 5 years ago

So to be clear, the PRTT2 variant in the positive vcf (g.29825015 Ref/Alt: G/GC c.649dupC) falls under the following scenario: 5' GCCCCCCCCCG 3' The above g. does not call this dupC at the most 3' position, which is standard for HGVS annotation. As a result, this gene is not ranked by Exomiser. However, when I manually convert the variant to g.29825024 Ref/Alt: -/C, which is the most 3' position, the Gene/Variant will be ranked 5th. I tested 50 positive Exomes and only ran into this issue with 2 out 50, with the other 48 performing incredibly well. Any insight would be incredibly helpful. Thank you!

julesjacobsen commented 5 years ago

I think there are a couple of issues here. @your-highness the inheritance filter performs additional frequency-based filtering for different inheritance modes the AD filter is set to be 0.1% by default, so the variant is considered too frequent to be a cause of disease. Note also that there are conflicting ClinVar interpretations, so this should be treated with caution.

julesjacobsen commented 5 years ago

@williakd17 sorry, I'm not completely clear here so you're saying that the input VCF:

16 29825015 . G GC 397.73 . Gene=PRRT2;Transcript=NM_001256443;HGVS=c.649dupC;AC=1;AF=0.500;AN=2;BaseQRankSum=-0.554;DP=78;FS=10.440;MLEAC=1;MLEAF=0.500;MQ=59.97;MQ0=0;MQRankSum=0.000;QD=5.10;ReadPosRankSum=0.539;EXON=2 GT:AD:GQ:PL:SB:ZG:ZW:WN:FP 0/1:42,23:99:435,0,958:16,23,4,19:Heterozygous:L:1:0

Should actually be

16 29825015 . C CC 397.73 . Gene=PRRT2;Transcript=NM_001256443;HGVS=c.649dupC;AC=1;AF=0.500;AN=2;BaseQRankSum=-0.554;DP=78;FS=10.440;MLEAC=1;MLEAF=0.500;MQ=59.97;MQ0=0;MQRankSum=0.000;QD=5.10;ReadPosRankSum=0.539;EXON=2 GT:AD:GQ:PL:SB:ZG:ZW:WN:FP 0/1:42,23:99:435,0,958:16,23,4,19:Heterozygous:L:1:0

Exomiser doesn't do any further alignment/calling apart from separating multi-call sites and trimming these alleles. Exomiser uses VCF trimming conventions and 1-based genomic coordinates because that is the format of the input file and other variant interpretation data. If you need to call the repeats differently, you'll need to fix that before input into exomiser.

Otherwise, I'm glad that exomiser is performing so well. Will you be publishing these results at any stage?

williakd17 commented 5 years ago

@julesjacobsen That makes sense. And reading your response to @your-highness made me think of another question. Is there an option in Exomiser to not take MAF into consideration for variant with an interp of path in clinvar or our internal db? By default Exomiser will filter out common paths that have a high MAF. Edit: I see you are working on this issue for #152 . In that thread, I believe I read that all MAF >0.5% are likely to be downgraded in Clinvar over time. However, there are a some common path variants in the population, so that's not always the case.

your-highness commented 5 years ago

Thanks for your response, @julesjacobsen. We are very confident that this CHEK2 variant is causative for our cohort. I have a follow up question regarding your comment:

the inheritance filter performs additional frequency-based filtering for different inheritance modes the AD filter is set to be 0.1% by default, so the variant is considered too frequent to be a cause of disease.

If I set the MAF to 100% in the webinterface the variant is shown -- even when an inheritance model is selected. It was not filtered out even when MAF is set to 100%?

Regarding the inner workings of exomiser-web: Where can I specify default values for the inheritance mode in exomiser-web? Does a variant have to suffice all inheritance modes in "automatic inheritance mode" to be shown?

In response to @williakd17 and #152, I second that request for upranking ClinVar listed variants (even for conflicting pathogenicity interpretations).

julesjacobsen commented 5 years ago

See https://github.com/exomiser/Exomiser/issues/152#issuecomment-448565905 for ClinVar ranking.

julesjacobsen commented 5 years ago

@your-highness You need to add in a prioritiser and some accurate patient phenotypes otherwise there is little point in running exomiser as you're losing the power of the phenotype similarity matching. I suggest you use the default HiPhive option.

your-highness commented 5 years ago

@your-highness You need to add in a prioritiser and some accurate patient phenotypes otherwise there is little point in running exomiser as you're losing the power of the phenotype similarity matching. I suggest you use the default HiPhive option.

When using OMIM114480 this CHEK2 variant is also filtered out.

How can I specify default thresholds for frequencies in an inheritance mode when using exomiser-web?

Thanks in advance!

williakd17 commented 5 years ago

@your-highness This kind of brings everything full circle. For CHEK2 c.1100delC, you stated how changing the MAF does not work for getting this variant to rank appropriately. This variant would be filtered out by the inheritanceModes: filter, and not the frequencyFilter. This variant has an Autosomal Dominant mode of inheritance and with the default filter value of 0.1%, will filter this variant out since the highest reported GnomAD population frequency for example is 0.8%, meaning if any of the populations have a greater value than the one you specify in the parameters, it will be filtered. I would recommend excluding the Ashekenazi Jewish Population frequencies for your analysis, as they have very high frequencies in most cases.

The issue with hiking up a lot of parameters is that it may be a Caucasian frequency for one patient/variant, but a South Asian for another patient/variant. Modifying the parameters to be less stringent will significantly reduce the performance of Exomiser. The thought here is to use ClinVar and ideally an input file the user can supply for all known Pathogenic, Likely Path, etc. variants that would be filtered out due to being a common variant. This would hopefully allow the stringent parameters to remain in place while bypassing the filtering for specific variants listed in the file. Another thing I would try to significantly improve your Exomiser performance would be to supply a list of genes in the genePanelFilter: section to only rank the genes that are clinically relevant to your clinical analysis for WES. By default, there are a lot of genes that are not currently clinically relevant for WES and will therefore decrease the gene ranking performance.

And to @julesjacobsen 's point, supplying clinically accurate HPO IDs for a given patient is extremely powerful in Exomiser. For example, adding the HP term for mode of inheritance changed a sample's result from unranked to the second overall hit.

your-highness commented 5 years ago

Dear @williakd17 ,

Thanks for the thourough response on the "full circle" and the tipps regarding the Ashekenazi Jewish population freqs and the genePanelFilter.

As said supplying a list of known causative variants would be a good feature. Or alternatively boost the scoring of ClinVar-supported variants.

Can you please tell me how I can set the inheritanceModes: <filter> and genePanelFilter: for exomiser-web? I can not how I can specify an analysis.yml.

Best,

julesjacobsen commented 5 years ago

The 'Filter for genes' is the genePanelFilter. The web interface doesn't allow you to alter the default inheritance mode frequency filters.

your-highness commented 5 years ago

@julesjacobsen Is there a way to alter the default values programmatically, e.g. provide a default analysis.yml?

julesjacobsen commented 5 years ago

I'm not sure why you're so focused on the web interface - use the cli. Or perform the analysis using recessive MOI and hope there is another candidate with which there might be a compound heterozygous inheritance.

your-highness commented 5 years ago

Dear @julesjacobsen.

Sorry for being so stubborn: Our Charité clinicians like to use the web interface with phenotype / disease prioritization and investigate the results. There exists a standardized SOP which fixes many parameters for prioritization (e.g. MOI frequencies, used population frequencies) and I would like to adapt the parameters to their standard. Surely, I can do by setting it hard coded in Java but a configuration file seems to be a better solution.

P.S.: This is all off topic now (sorry @williakd17 ).

Thanks

DGMichael commented 5 years ago

Currently, we use the CLI and python code library pyYAML to generate a configuration .yml for each case that is then run through the CLI of examiner. It works quite well for insertion of HPO terms, gene lists and variant frequency filters.

Drew

On Dec 20, 2018, at 9:49 AM, Johannes Helmuth notifications@github.com wrote:

Dear @julesjacobsen https://github.com/julesjacobsen.

Sorry for being so stubborn: Our Charité clinicians like to use the web interface with phenotype / disease prioritization and investigate the results. There exists a standardized SOP which fixes many parameters for prioritization (e.g. MOI frequencies, used population frequencies) and I would like to adapt the parameters to their standard. Surely, I can do by setting it hard coded in Java but a configuration file seems to be a better solution.

P.S.: This is all off topic now (sorry @williakd17 https://github.com/williakd17 ).

Thanks

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/300#issuecomment-449023060, or mute the thread https://github.com/notifications/unsubscribe-auth/AHU7aLl8Rc7KQZAqVqh_IIBcbG-cnuUiks5u66NfgaJpZM4Y2NKO.

julesjacobsen commented 5 years ago

Closing this as I think the issue is addressed by issue #152