Closed nataled closed 2 years ago
Current status:
Total in-scope EWAS: 26522 Reactome PRO Total intersection: 14670 11221 In-scope EWAS: 14498 11065 Canonical: 8826 7042 Isoforms: 198 172 Subseq: 3629 2429 PTM: 1845 1422 Variants: 3 3 Complex: 145 138 Set: 24 15 Reactome (in-scope EWAS) accounted for: 55% (14498/26522)
Each row shows the number of Reactome entities (release 79) that map to the indicated number of PRO entities (release 65).
Considered out of scope: variants of any type (insertions, deletions, replacements), non-proteins (genes, various RNAs), modifying proteins when 'modified' by target (e.g., if there is an EWAS “ABC1 modified by SUMO1”, the complementary EWAS "SUMO1 'modified' by ABC1" is ignored)
Thanks @nataled ping @deustp01
@nataled Is there a typo? (e.g., if there is an EWAS “ABC1 modified by SUMO1”, the complementary EWAS "SUMO1 'modified' by ABC1" is ignored) should be (e.g., if there is an EWAS “ABC1 modified by SUMO1”, the complementary EWAS "SUMO1 'modifies' ABC1" is ignored)
Not sure I understand.
@ukemi Reactome has two entries that represent the same entity, but from opposite 'perspectives'. These always involve proteins that are modified by proteins; that is, by ubiquitin-like modifiers. So if there is some hypothetical protein ABC1, and that protein has SUMO1 attached at some lysine residue, that entity will have a complementary EWAS in Reactome that takes SUMO1 as what I'll call the 'main' or 'primary' protein, and will assert that the SUMO1 has an attached ABC1. Thus, two perspectives of the same entity.
In general, I find most of these not only redundant, but imprecisely described in Reactome. @deustp01 has suggested the redundant (modifier-centric) be deleted from Reactome, which (given the issues I've found) is a recommendation I fully agree with. If it is decided to keep these in Reactome, I will still ignore them in PRO, but I'll report back to Reactome (under a separate ticket) the problematic cases.
Got it! Thanks!
The basic problem is that we got carried away. When a ubiquitin molecule is covalently attached to a molecule of protein ABC1, that can be annotated as a ubiquitin-modified amino acid side chain of ABC1. It can equally well be annotated as an ABC1-modified amino acid side chain of ubiquitin. The first annotation makes sense when talking about ubiquitination (or SUMOylation, etc.) and the second one doesn't, so our plan is to find and remove all annotations of the second kind, and to improve our curation process to prevent this. A list of problems from Darren will be good, both as a check and as a reminder to get the job done soon!
Need to write a script that will create a status report.