exomiser / Exomiser

A Tool to Annotate and Prioritize Exome Variants
https://exomiser.readthedocs.io
GNU Affero General Public License v3.0
195 stars 54 forks source link

Diagnostic-grade Exomiser #159

Open damiansm opened 7 years ago

damiansm commented 7 years ago

Diagnostic-grade Exomiser with a manually curated source of reliable disease-gene associations i.e. like what we do for Genomics England with PanelApp. The trouble is no-one has this knowledgebase yet and guessing it will be a while before ClinGen really gets there? This may be another use-case where we want to allow people to use their own data as an option. I could see an Exomiser that took in the PanelApp disease-gene associations would be really powerful for GeL and of course I could hack it for them but I suspect other groups would be keen.

julesjacobsen commented 5 years ago

This is related to/partly duplicates #265

damiansm commented 5 years ago

Clinical-grade disease-gene associations: 3928 genes in OMIM currently associated to disease, <3500 single-gene, classical Mendelian though

  1. work with PanelApp team to get an OMIM disease ID for every curated gene - resurrect what I did before and see if they made progress with OpenTargets
  2. do a similar level of curation on the remaining dis-gene associations in OMIM and Orphanet

See Slack commentary in curation channel

julesjacobsen commented 1 year ago

@damiansm There is ClinGen and TheGenCC which could fulfill this criteria, assuming we used the 'definitive' associations.

This would require a database change to store the data and some means of enabling this from the user command through to the application only using these annotations.

damiansm commented 1 year ago

@julesjacobsen Yes - this recent work makes this task a lot easier. It could be as easy as trimming the disease2gene table we load into Exomiser based on knowledge from GenCC. Looks like MONOD based but original OMIM ID is also stored: https://search.thegencc.org/genes/HGNC:13666

julesjacobsen commented 1 year ago

Good to know they have the original OMIM ID in there as that will be really useful. The next question is what classification to use when there are conflicts between submitters. In this case Ambry and TGMI say Definitive, Invitae go for Strong (although they never use definitive (https://search.thegencc.org/submitters/GENCC_000106)) and Orphanet only ever use Supportive, so I guess going with a Strong or Definitive assertion should be safe to do if trying to summarise.

For example:

https://search.thegencc.org/genes/HGNC:17416 gene_curie gene_symbol disease_curie disease_title disease_original_curie disease_original_title classification_curie classification_title moi_curie moi_title submitter_curie submitter_title
HGNC:17416 ADGRV1 MONDO:0011558 Usher syndrome type 2C OMIM:605472 OMIM:605472 GENCC:100002 Strong HP:0000007 Autosomal recessive GENCC:000111 PanelApp Australia
HGNC:17416 ADGRV1 MONDO:0016484 Usher syndrome type 2 Orphanet:231178 Orphanet:231178 GENCC:100009 Supportive HP:0000007 Autosomal recessive GENCC:000110 Orphanet
HGNC:17416 ADGRV1 MONDO:0016484 Usher syndrome type 2 MONDO:0016484 Usher syndrome type 2 GENCC:100001 Definitive HP:0000007 Autosomal recessive GENCC:000102 ClinGen
HGNC:17416 ADGRV1 MONDO:0011443 febrile seizures, familial, 4 OMIM:604352 OMIM:604352 GENCC:100004 Limited HP:0000006 Autosomal dominant GENCC:000101 Ambry Genetics
HGNC:17416 ADGRV1 MONDO:0019497 nonsyndromic genetic hearing loss MONDO:0019497 nonsyndromic genetic hearing loss GENCC:100005 Disputed Evidence HP:0000007 Autosomal recessive GENCC:000102 ClinGen

The only problem is that ClinGen only curate to MONDO and these don't always align with OMIM (at least in this case it does with Orphanet) so the Orphanet and ClinGen submissions could be merged with Definitive evidence but there is still an issue about if and where to use MONDO identifiers.

damiansm commented 1 year ago

I guess GenCC has potentially conflicting classifications for the same OMIM or Orphanet disease ID whereas ClinGen will have a consistent annotation?

One option is to just take all Definitive annotations and not worry about conflicts.

On Mon, Jul 3, 2023 at 3:42 PM Jules Jacobsen @.***> wrote:

Good to know they have the original OMIM ID in there as that will be really useful. The next question is what classification to use when there are conflicts between submitters. In this case Ambry and TGMI say 'Definitive', Invitae go for 'Strong' (although they never use definitive ( https://search.thegencc.org/submitters/GENCC_000106)) and Orphanet only ever use 'Supportive' https://search.thegencc.org/submitters/GENCC_000110, so I guess going with a Strong or Definitive assertion should be safe to do if trying to summarise.

For example:

https://search.thegencc.org/genes/HGNC:17416 gene_curie gene_symbol disease_curie disease_title disease_original_curie disease_original_title classification_curie classification_title moi_curie moi_title submitter_curie submitter_title HGNC:17416 ADGRV1 MONDO:0011558 Usher syndrome type 2C OMIM:605472 OMIM:605472 GENCC:100002 Strong HP:0000007 Autosomal recessive GENCC:000111 PanelApp Australia HGNC:17416 ADGRV1 MONDO:0016484 Usher syndrome type 2 Orphanet:231178 Orphanet:231178 GENCC:100009 Supportive HP:0000007 Autosomal recessive GENCC:000110 Orphanet HGNC:17416 ADGRV1 MONDO:0016484 Usher syndrome type 2 MONDO:0016484 Usher syndrome type 2 GENCC:100001 Definitive HP:0000007 Autosomal recessive GENCC:000102 ClinGen HGNC:17416 ADGRV1 MONDO:0011443 febrile seizures, familial, 4 OMIM:604352 OMIM:604352 GENCC:100004 Limited HP:0000006 Autosomal dominant GENCC:000101 Ambry Genetics" HGNC:17416 ADGRV1 MONDO:0019497 nonsyndromic genetic hearing loss MONDO:0019497 nonsyndromic genetic hearing loss GENCC:100005 Disputed Evidence HP:0000007 Autosomal recessive GENCC:000102 ClinGen

The only problem is that ClinGen only curate to MONDO and these don't always align with OMIM (at least in this case it does with Orphanet) so the Orphanet and ClinGen submissions could be merged with Definitive evidence but there is still an issue about if and where to use MONDO identifiers.

— Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/159#issuecomment-1618489946, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHO4PBONCSRH2QVAL4ZZBDXOLK5DANCNFSM4CX27QMA . You are receiving this because you were mentioned.Message ID: @.***>

julesjacobsen commented 1 year ago

The issue is the Orphanet only assert Supporting, ClinGen assert the full range and only use MONDO identifiers. PanelApp (GEL & Aus) use mostly OMIM identifiers for original submissions, but The GenCC map everything to MONDO. OMIM still haven't submitted any assertions.

Our current disease table looks like this:

DISEASE_ID OMIM_GENE_ID DISEASENAME GENE_ID TYPE INHERITANCE
OMIM:604352 OMIM:602851 ?Febrile seizures, familial, 4 84059 ? D
OMIM:605472 OMIM:602851 Usher syndrome, type 2C 84059 D R
ORPHA:231178 OMIM:602851 Usher syndrome type 2 84059 D R
ORPHA:36387 OMIM:602851 Generalized epilepsy with febrile seizures-plus 84059 D D

Altering it to look like this, would be a lot more useful (the OMIM_GENE_ID column is unused in the application):

DISEASE_ID MONDO_ID DISEASENAME GENE_ID TYPE INHERITANCE GENCC_VALIDITY
OMIM:604352 MONDO:0011443 Febrile seizures, familial, 4 84059 ? D Limited
OMIM:605472 MONDO:0011558 Usher syndrome, type 2C 84059 D R Strong
ORPHA:231178 MONDO:0016484 Usher syndrome type 2 84059 D R Definitive (Supportive in Orphanet)
ORPHA:36387 MONDO:0018214 Generalized epilepsy with febrile seizures-plus 84059 D D (our data come from Orphanet but there is no assertion for this D2G yet, probably safe to assume Supportive)

This would involve some kind or four-way merge of OMIM, HPOA, Orphanet and GenCC annotations 😱

damiansm commented 1 year ago

Lets decide what if we are going to use the data for first and for what? If we just want to have a definitive only setting then we can just flag those associations in the disease table. If we want to display the tag for all rows or use it for weighting etc then we need all the data.

cmungall commented 1 year ago

cc @putmantime, we have also been looking at comparing g2d associations, let's coordinate efforts

On Mon, Jul 3, 2023 at 7:42 AM Jules Jacobsen @.***> wrote:

Good to know they have the original OMIM ID in there as that will be really useful. The next question is what classification to use when there are conflicts between submitters. In this case Ambry and TGMI say 'Definitive', Invitae go for 'Strong' (although they never use definitive ( https://search.thegencc.org/submitters/GENCC_000106)) and Orphanet only ever use 'Supportive' https://search.thegencc.org/submitters/GENCC_000110, so I guess going with a Strong or Definitive assertion should be safe to do if trying to summarise.

For example:

https://search.thegencc.org/genes/HGNC:17416 gene_curie gene_symbol disease_curie disease_title disease_original_curie disease_original_title classification_curie classification_title moi_curie moi_title submitter_curie submitter_title HGNC:17416 ADGRV1 MONDO:0011558 Usher syndrome type 2C OMIM:605472 OMIM:605472 GENCC:100002 Strong HP:0000007 Autosomal recessive GENCC:000111 PanelApp Australia HGNC:17416 ADGRV1 MONDO:0016484 Usher syndrome type 2 Orphanet:231178 Orphanet:231178 GENCC:100009 Supportive HP:0000007 Autosomal recessive GENCC:000110 Orphanet HGNC:17416 ADGRV1 MONDO:0016484 Usher syndrome type 2 MONDO:0016484 Usher syndrome type 2 GENCC:100001 Definitive HP:0000007 Autosomal recessive GENCC:000102 ClinGen HGNC:17416 ADGRV1 MONDO:0011443 febrile seizures, familial, 4 OMIM:604352 OMIM:604352 GENCC:100004 Limited HP:0000006 Autosomal dominant GENCC:000101 Ambry Genetics" HGNC:17416 ADGRV1 MONDO:0019497 nonsyndromic genetic hearing loss MONDO:0019497 nonsyndromic genetic hearing loss GENCC:100005 Disputed Evidence HP:0000007 Autosomal recessive GENCC:000102 ClinGen

The only problem is that ClinGen only curate to MONDO and these don't always align with OMIM (at least in this case it does with Orphanet) so the Orphanet and ClinGen submissions could be merged with Definitive evidence but there is still an issue about if and where to use MONDO identifiers.

— Reply to this email directly, view it on GitHub https://github.com/exomiser/Exomiser/issues/159#issuecomment-1618489946, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOKOTF6TW6UIURFLLHDXOLK5DANCNFSM4CX27QMA . You are receiving this because you are subscribed to this thread.Message ID: @.***>