Closed williakd17 closed 5 years ago
Was it a problem with the phenotype score not being high enough? Is this still an issue?
Apparently it was an issue with the reference and alt in the vcf for these regions. The variant is a dupC, however the ref is a G (adjacent base to the start of the repeat) and the alt is GC. Technically it should be Ref C Alt CC. I read an entry you posted discussing how this was a Jannovar issue (https://github.com/exomiser/Exomiser/issues/207), however I am unsure if this will directly help resolve my issue. Interesting enough, if the frequencyFilter was disabled (not ideal or practical), the variant was appropriately ranked. If the ref and alt was manually changed to the technically correct ones above, the gene would also be ranked appropriately (Exomiser: 0.9822565 Variant: 1.0 Phenotype: 0.78100365 Genomic Position: g.29825024_29825025insC)
Dear @julesjacobsen and @williakd17 ,
Sorry for hijacking the thread but my problem pertains to the same problem, I assume.
I have a very similar problem when analyzing CHEK2 variants with a local installation of exomiser-web
from v11 Release and 1807_phenotype:
Known causative CHEK2 variants are not shown on the results page - unless I increase MAF threshold to 100% and allow for non-pathogenic variants, like shown below.
For example, a heterozygous frameshift variant in CHEK2 chr22:g.29091856AG>A [0/1]
22 29091856 . AG A 7293.73 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.044e+00;ClippingRankSum=0.00;DP=380;ExcessHet=3.0103;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=59.90;MQRankSum=0.546;QD=19.19;ReadPosRankSum=-2.950e-01;SOR=0.673 GT:AD:DP:GQ:PL 0/1:169,211:380:99:7331,0,5631
is filtered out when setting MAF to 1%, although all reported frequencies are <1%:
I am confused that the Variant Score is 0. If I check the Variants.tsv file, it says the variant is filtered out by "inheritance" but I am not sure if this makes sense. Moreover
Also CHEK2 has a certain number of HPO terms associated to it (similar to BRCA1 which is not filtered out):
$ rgrep -e CHEK2 -e BRCA1 1807_phenotype
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672 BRCA1 Nausea and vomiting HP:0002017
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672 BRCA1 Constipation HP:0002019
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672 BRCA1 Abnormality of the abdominal wall HP:0004298
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672 BRCA1 Abdominal pain HP:0002027
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672 BRCA1 Weight loss HP:0001824
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672 BRCA1 Abnormality of the peritoneum HP:0002585
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:672 BRCA1 Ovarian neoplasm HP:0100615
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Neoplasm of the skin HP:0008069
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Abnormality of metabolism/homeostasis HP:0001939
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Neoplasm of the nervous system HP:0004375
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Autosomal dominant inheritance HP:0000006
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Melanoma HP:0002861
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Glioma HP:0009733
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Neoplasm of the lungs HP:0100526
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Neoplasm of the pancreas HP:0002894
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Meningioma HP:0002858
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Acute leukemia HP:0002488
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Neoplasm of the breast HP:0100013
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Breast carcinoma HP:0003002
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Stomach cancer HP:0012126
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Neoplasm of the adrenal cortex HP:0100641
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Neoplasm of the skeletal system HP:0010622
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Retinoblastoma HP:0009919
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Lymphoma HP:0002665
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Neoplasm of the colon HP:0100273
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Sarcoma HP:0100242
1807_phenotype/phenix/ALL_SOURCES_ALL_FREQUENCIES_genes_to_phenotype.txt:11200 CHEK2 Osteosarcoma HP:0002669
Do you have any idea how I can trace that issue?
Thanks in advance!
So to be clear, the PRTT2 variant in the positive vcf (g.29825015 Ref/Alt: G/GC c.649dupC) falls under the following scenario: 5' GCCCCCCCCCG 3' The above g. does not call this dupC at the most 3' position, which is standard for HGVS annotation. As a result, this gene is not ranked by Exomiser. However, when I manually convert the variant to g.29825024 Ref/Alt: -/C, which is the most 3' position, the Gene/Variant will be ranked 5th. I tested 50 positive Exomes and only ran into this issue with 2 out 50, with the other 48 performing incredibly well. Any insight would be incredibly helpful. Thank you!
I think there are a couple of issues here. @your-highness the inheritance filter performs additional frequency-based filtering for different inheritance modes the AD filter is set to be 0.1% by default, so the variant is considered too frequent to be a cause of disease. Note also that there are conflicting ClinVar interpretations, so this should be treated with caution.
@williakd17 sorry, I'm not completely clear here so you're saying that the input VCF:
16 29825015 . G GC 397.73 . Gene=PRRT2;Transcript=NM_001256443;HGVS=c.649dupC;AC=1;AF=0.500;AN=2;BaseQRankSum=-0.554;DP=78;FS=10.440;MLEAC=1;MLEAF=0.500;MQ=59.97;MQ0=0;MQRankSum=0.000;QD=5.10;ReadPosRankSum=0.539;EXON=2 GT:AD:GQ:PL:SB:ZG:ZW:WN:FP 0/1:42,23:99:435,0,958:16,23,4,19:Heterozygous:L:1:0
Should actually be
16 29825015 . C CC 397.73 . Gene=PRRT2;Transcript=NM_001256443;HGVS=c.649dupC;AC=1;AF=0.500;AN=2;BaseQRankSum=-0.554;DP=78;FS=10.440;MLEAC=1;MLEAF=0.500;MQ=59.97;MQ0=0;MQRankSum=0.000;QD=5.10;ReadPosRankSum=0.539;EXON=2 GT:AD:GQ:PL:SB:ZG:ZW:WN:FP 0/1:42,23:99:435,0,958:16,23,4,19:Heterozygous:L:1:0
Exomiser doesn't do any further alignment/calling apart from separating multi-call sites and trimming these alleles. Exomiser uses VCF trimming conventions and 1-based genomic coordinates because that is the format of the input file and other variant interpretation data. If you need to call the repeats differently, you'll need to fix that before input into exomiser.
Otherwise, I'm glad that exomiser is performing so well. Will you be publishing these results at any stage?
@julesjacobsen That makes sense. And reading your response to @your-highness made me think of another question. Is there an option in Exomiser to not take MAF into consideration for variant with an interp of path in clinvar or our internal db? By default Exomiser will filter out common paths that have a high MAF. Edit: I see you are working on this issue for #152 . In that thread, I believe I read that all MAF >0.5% are likely to be downgraded in Clinvar over time. However, there are a some common path variants in the population, so that's not always the case.
Thanks for your response, @julesjacobsen. We are very confident that this CHEK2 variant is causative for our cohort. I have a follow up question regarding your comment:
the inheritance filter performs additional frequency-based filtering for different inheritance modes the AD filter is set to be 0.1% by default, so the variant is considered too frequent to be a cause of disease.
If I set the MAF to 100% in the webinterface the variant is shown -- even when an inheritance model is selected. It was not filtered out even when MAF is set to 100%?
Regarding the inner workings of exomiser-web: Where can I specify default values for the inheritance mode in exomiser-web? Does a variant have to suffice all inheritance modes in "automatic inheritance mode" to be shown?
In response to @williakd17 and #152, I second that request for upranking ClinVar listed variants (even for conflicting pathogenicity interpretations).
See https://github.com/exomiser/Exomiser/issues/152#issuecomment-448565905 for ClinVar ranking.
@your-highness You need to add in a prioritiser and some accurate patient phenotypes otherwise there is little point in running exomiser as you're losing the power of the phenotype similarity matching. I suggest you use the default HiPhive option.
@your-highness You need to add in a prioritiser and some accurate patient phenotypes otherwise there is little point in running exomiser as you're losing the power of the phenotype similarity matching. I suggest you use the default HiPhive option.
When using OMIM114480 this CHEK2 variant is also filtered out.
How can I specify default thresholds for frequencies in an inheritance mode when using exomiser-web
?
Thanks in advance!
@your-highness This kind of brings everything full circle. For CHEK2 c.1100delC, you stated how changing the MAF does not work for getting this variant to rank appropriately. This variant would be filtered out by the inheritanceModes: filter, and not the frequencyFilter. This variant has an Autosomal Dominant mode of inheritance and with the default filter value of 0.1%, will filter this variant out since the highest reported GnomAD population frequency for example is 0.8%, meaning if any of the populations have a greater value than the one you specify in the parameters, it will be filtered. I would recommend excluding the Ashekenazi Jewish Population frequencies for your analysis, as they have very high frequencies in most cases.
The issue with hiking up a lot of parameters is that it may be a Caucasian frequency for one patient/variant, but a South Asian for another patient/variant. Modifying the parameters to be less stringent will significantly reduce the performance of Exomiser. The thought here is to use ClinVar and ideally an input file the user can supply for all known Pathogenic, Likely Path, etc. variants that would be filtered out due to being a common variant. This would hopefully allow the stringent parameters to remain in place while bypassing the filtering for specific variants listed in the file. Another thing I would try to significantly improve your Exomiser performance would be to supply a list of genes in the genePanelFilter: section to only rank the genes that are clinically relevant to your clinical analysis for WES. By default, there are a lot of genes that are not currently clinically relevant for WES and will therefore decrease the gene ranking performance.
And to @julesjacobsen 's point, supplying clinically accurate HPO IDs for a given patient is extremely powerful in Exomiser. For example, adding the HP term for mode of inheritance changed a sample's result from unranked to the second overall hit.
Dear @williakd17 ,
Thanks for the thourough response on the "full circle" and the tipps regarding the Ashekenazi Jewish population freqs and the genePanelFilter
.
As said supplying a list of known causative variants would be a good feature. Or alternatively boost the scoring of ClinVar-supported variants.
Can you please tell me how I can set the inheritanceModes: <filter>
and genePanelFilter:
for exomiser-web? I can not how I can specify an analysis.yml.
Best,
The 'Filter for genes' is the genePanelFilter. The web interface doesn't allow you to alter the default inheritance mode frequency filters.
@julesjacobsen Is there a way to alter the default values programmatically, e.g. provide a default analysis.yml
?
I'm not sure why you're so focused on the web interface - use the cli. Or perform the analysis using recessive MOI and hope there is another candidate with which there might be a compound heterozygous inheritance.
Dear @julesjacobsen.
Sorry for being so stubborn: Our Charité clinicians like to use the web interface with phenotype / disease prioritization and investigate the results. There exists a standardized SOP which fixes many parameters for prioritization (e.g. MOI frequencies, used population frequencies) and I would like to adapt the parameters to their standard. Surely, I can do by setting it hard coded in Java but a configuration file seems to be a better solution.
P.S.: This is all off topic now (sorry @williakd17 ).
Thanks
Currently, we use the CLI and python code library pyYAML to generate a configuration .yml for each case that is then run through the CLI of examiner. It works quite well for insertion of HPO terms, gene lists and variant frequency filters.
Drew
On Dec 20, 2018, at 9:49 AM, Johannes Helmuth notifications@github.com wrote:
Dear @julesjacobsen https://github.com/julesjacobsen.
Sorry for being so stubborn: Our Charité clinicians like to use the web interface with phenotype / disease prioritization and investigate the results. There exists a standardized SOP which fixes many parameters for prioritization (e.g. MOI frequencies, used population frequencies) and I would like to adapt the parameters to their standard. Surely, I can do by setting it hard coded in Java but a configuration file seems to be a better solution.
P.S.: This is all off topic now (sorry @williakd17 https://github.com/williakd17 ).
Thanks
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/300#issuecomment-449023060, or mute the thread https://github.com/notifications/unsubscribe-auth/AHU7aLl8Rc7KQZAqVqh_IIBcbG-cnuUiks5u66NfgaJpZM4Y2NKO.
Closing this as I think the issue is addressed by issue #152
Unsure where the original post went (I must have accidentally edited/removed it), but the variant in question is the following: PRRT2 | NM_001256442.1 | c.649dupC | 397.73 | 0.270479% | Autosomal Dominant This variant is in a 8 C repeat with an adjacent G. The original vcf line is as follows:
chr16 29825015 . G GC 397.73 . Gene=PRRT2;Transcript=NM_001256443;HGVS=c.649dupC;AC=1;AF=0.500;AN=2;BaseQRankSum=-0.554;DP=78;FS=10.440;MLEAC=1;MLEAF=0.500;MQ=59.97;MQ0=0;MQRankSum=0.000;QD=5.10;ReadPosRankSum=0.539;EXON=2 GT:AD:GQ:PL:SB:ZG:ZW:WN:FP 0/1:42,23:99:435,0,958:16,23,4,19:Heterozygous:L:1:0