exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
197 stars 54 forks source link

False positive hits to Orphanet copy-number diseases. #557

Open julesjacobsen opened 5 months ago

julesjacobsen commented 5 months ago

Exomiser <= 14.0.0 will give false positive hits for small sequence variants located in genes associated with contiguous gene deletion disorders mainly from Orphanet, e.g. Williams syndrome ORPHA:904 / OMIM:194050

The confusion arises from the disease-gene associations and the way Exomiser treats the disease type 'C' (copy-number) the same as 'D' (disease). In the case of disease.type = 'C' then the 'gene' needs to be treated as a large contiguous region covering all the associated genes and only matched to large deletions covering ~80/90+% of this region.

DISEASE_ID  DISEASENAME         OMIM_GENE_ID    GENE_ID SYMBOL  TYPE    INHERITANCE
ORPHA:904   Williams syndrome   OMIM:186590 6804    STX1A   C   D
ORPHA:904   Williams syndrome   OMIM:605842 26608   TBL2    C   D
ORPHA:904   Williams syndrome   OMIM:608899 84163   GTF2IRD2    C   D
ORPHA:904   Williams syndrome   OMIM:610039 155382  VPS37D  C   D
ORPHA:904   Williams syndrome   OMIM:604839 8468    FKBP6   C   D
ORPHA:904   Williams syndrome   OMIM:600404 5982    RFC2    D   D
ORPHA:904   Williams syndrome   OMIM:608512 653361  NCF1    C   D
ORPHA:904   Williams syndrome   OMIM:612547 135886  TMEM270 C   D
ORPHA:904   Williams syndrome   OMIM:605678 51085   MLXIPL  D   D
ORPHA:904   Williams syndrome   OMIM:603431 7458    EIF4H   C   D
ORPHA:904   Williams syndrome   OMIM:618202 84277   DNAJC30 D   D
ORPHA:904   Williams syndrome   OMIM:605846 9275    BCL7B   D   D
ORPHA:904   Williams syndrome   OMIM:612546 155368  METTL27 C   D
ORPHA:904   Williams syndrome   OMIM:615733 114049  BUD23   C   D
ORPHA:904   Williams syndrome   OMIM:605681 9031    BAZ1B   C   D
ORPHA:904   Williams syndrome   OMIM:603432 7461    CLIP2   C   D
ORPHA:904   Williams syndrome   OMIM:130160 2006    ELN C   D
ORPHA:904   Williams syndrome   OMIM:601679 2969    GTF2I   C   D
ORPHA:904   Williams syndrome   OMIM:604318 9569    GTF2IRD1    C   D
ORPHA:904   Williams syndrome   OMIM:601329 3984    LIMK1   C   D
pnrobinson commented 5 months ago

This would be great. Caveat: Some of the entries are set to "D" without any obvious reason, e.g. RFC2, which is not listed as a disease gene in OMIM.

damiansm commented 5 months ago

Yes - spotted that as well and we are going to investigate where that is coming from. Presumably the Orphanet XML. What do you think it means if it is genuine - that LoF of BCL7B alone would cause the syndrome or at least a good part of the symptoms?

On Sat, Apr 13, 2024 at 9:20 AM Peter Robinson @.***> wrote:

This would be great. Caveat: Some of the entries are set to "D" without any obvious reason, e.g. RFC2, which is not listed as a disease gene in OMIM.

— Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/557#issuecomment-2053570000, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHO4PCHMO3WEJUZCUYO4ALY5DTFXAVCNFSM6AAAAABGEAJ2UOVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANJTGU3TAMBQGA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

julesjacobsen commented 3 months ago

Related: https://github.com/monarch-initiative/phenol/issues/445