geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Write a script to capture the status of Reactome entities in PRO #144

Closed nataled closed 2 years ago

nataled commented 2 years ago

Need to write a script that will create a status report.

nataled commented 2 years ago

Current status:

Total in-scope EWAS:    26522

                     Reactome          PRO
Total intersection:     14670        11221
In-scope EWAS:          14498        11065
  Canonical:             8826         7042
  Isoforms:               198          172
  Subseq:                3629         2429
  PTM:                   1845         1422
Variants:                   3            3
Complex:                  145          138
Set:                       24           15

Reactome (in-scope EWAS) accounted for: 55% (14498/26522)

Each row shows the number of Reactome entities (release 79) that map to the indicated number of PRO entities (release 65).

Considered out of scope: variants of any type (insertions, deletions, replacements), non-proteins (genes, various RNAs), modifying proteins when 'modified' by target (e.g., if there is an EWAS “ABC1 modified by SUMO1”, the complementary EWAS "SUMO1 'modified' by ABC1" is ignored)

ukemi commented 2 years ago

Thanks @nataled ping @deustp01

ukemi commented 2 years ago

@nataled Is there a typo? (e.g., if there is an EWAS “ABC1 modified by SUMO1”, the complementary EWAS "SUMO1 'modified' by ABC1" is ignored) should be (e.g., if there is an EWAS “ABC1 modified by SUMO1”, the complementary EWAS "SUMO1 'modifies' ABC1" is ignored)

Not sure I understand.

nataled commented 2 years ago

@ukemi Reactome has two entries that represent the same entity, but from opposite 'perspectives'. These always involve proteins that are modified by proteins; that is, by ubiquitin-like modifiers. So if there is some hypothetical protein ABC1, and that protein has SUMO1 attached at some lysine residue, that entity will have a complementary EWAS in Reactome that takes SUMO1 as what I'll call the 'main' or 'primary' protein, and will assert that the SUMO1 has an attached ABC1. Thus, two perspectives of the same entity.

In general, I find most of these not only redundant, but imprecisely described in Reactome. @deustp01 has suggested the redundant (modifier-centric) be deleted from Reactome, which (given the issues I've found) is a recommendation I fully agree with. If it is decided to keep these in Reactome, I will still ignore them in PRO, but I'll report back to Reactome (under a separate ticket) the problematic cases.

ukemi commented 2 years ago

Got it! Thanks!

deustp01 commented 2 years ago

The basic problem is that we got carried away. When a ubiquitin molecule is covalently attached to a molecule of protein ABC1, that can be annotated as a ubiquitin-modified amino acid side chain of ABC1. It can equally well be annotated as an ABC1-modified amino acid side chain of ubiquitin. The first annotation makes sense when talking about ubiquitination (or SUMOylation, etc.) and the second one doesn't, so our plan is to find and remove all annotations of the second kind, and to improve our curation process to prevent this. A list of problems from Darren will be good, both as a check and as a reminder to get the job done soon!