Closed rwst closed 1 year ago
This is the script to produce the list: grep -C1 'DE\s\+\(Deleted\|Transferred\)' enzyme.dat |grep '^ID' |sed 's/^ID\s\+//g'
Using this I find 1297 lines referring to obsolete EC numbers, all in def or synonym statements. The 269 unique numbers referred to are:
1.10.2.2, 1.10.9.1, 1.10.99.1, 1.10.99.2, 1.10.99.3, 1.1.1.128, 1.1.1.158, 1.1.1.161, 1.1.1.246, 1.1.1.63, 1.12.99.2, 1.13.11.13, 1.13.11.32, 1.13.11.44, 1.13.12.11, 1.13.12.12, 1.13.12.14, 1.1.3.3, 1.1.4.1, 1.14.11.14, 1.14.11.19, 1.14.11.22, 1.14.11.23, 1.14.12.4, 1.14.12.5, 1.14.13.100, 1.14.13.104, 1.14.13.108, 1.14.13.109, 1.14.13.11, 1.14.13.117, 1.14.13.118, 1.14.13.12, 1.14.13.13, 1.14.13.132, 1.14.13.137, 1.14.13.138, 1.14.13.139, 1.14.13.140, 1.14.13.141, 1.14.13.142, 1.14.13.143, 1.14.13.144, 1.14.13.145, 1.14.13.15, 1.14.13.17, 1.14.13.21, 1.14.13.26, 1.14.13.28, 1.14.13.3, 1.14.13.30, 1.14.13.36, 1.14.13.37, 1.14.13.41, 1.14.13.42, 1.14.13.47, 1.14.13.48, 1.14.13.52, 1.14.13.53, 1.14.13.55, 1.14.13.56, 1.14.13.57, 1.14.13.60, 1.14.13.67, 1.14.13.68, 1.14.13.70, 1.14.13.71, 1.14.13.72, 1.14.13.73, 1.14.13.74, 1.14.13.75, 1.14.13.76, 1.14.13.77, 1.14.13.79, 1.14.13.80, 1.14.13.85, 1.14.13.86, 1.14.13.87, 1.14.13.88, 1.14.13.89, 1.14.13.90, 1.14.13.91, 1.14.13.93, 1.14.13.94, 1.14.13.95, 1.14.13.96, 1.14.13.97, 1.14.13.98, 1.14.13.99, 1.14.15.2, 1.1.4.2, 1.14.21.1, 1.14.21.2, 1.14.21.3, 1.14.21.4, 1.14.21.5, 1.14.21.6, 1.14.99.10, 1.14.99.27, 1.14.99.28, 1.14.99.3, 1.14.99.30, 1.14.99.31, 1.14.99.32, 1.14.99.33, 1.14.99.36, 1.14.99.9, 1.1.5.6, 1.17.1.2, 1.17.99.1, 1.17.99.5, 1.1.99.10, 1.1.99.23, 1.1.99.8, 1.2.1.1, 1.2.1.2, 1.2.1.40, 1.2.1.43, 1.2.1.66, 1.2.2.2, 1.2.2.3, 1.2.3.11, 1.2.7.2, 1.2.99.2, 1.2.99.3, 1.2.99.4, 1.2.99.5, 1.3.1.23, 1.3.1.26, 1.3.1.30, 1.3.1.35, 1.3.1.4, 1.3.1.52, 1.3.1.59, 1.3.1.63, 1.3.3.1, 1.3.3.9, 1.3.99.1, 1.3.99.10, 1.3.99.13, 1.3.99.21, 1.3.99.22, 1.3.99.3, 1.3.99.7, 1.4.99.1, 1.4.99.3, 1.5.1.12, 1.5.1.29, 1.5.1.35, 1.5.3.11, 1.5.99.11, 1.5.99.8, 1.5.99.9, 1.6.6.9, 1.6.99.5, 1.7.3.4, 1.7.99.8, 1.8.99.1, 1.8.99.3, 1.97.1.10, 1.97.1.11, 1.97.1.3, 1.97.1.8, 1.9.99.1, 2.1.1.124, 2.1.1.125, 2.1.1.126, 2.1.1.138, 2.1.1.149, 2.1.1.29, 2.1.1.31, 2.1.1.32, 2.1.1.36, 2.1.1.48, 2.1.1.51, 2.1.1.52, 2.1.1.66, 2.3.1.104, 2.3.1.119, 2.3.1.128, 2.3.1.154, 2.3.1.70, 2.3.1.88, 2.3.1.96, 2.4.1.119, 2.4.1.130, 2.4.1.157, 2.4.1.163, 2.4.1.45, 2.4.1.57, 2.4.1.95, 2.4.2.23, 2.4.99.11, 2.5.1.11, 2.6.1.68, 2.7.1.69, 2.7.7.21, 2.7.7.25, 2.7.7.54, 2.7.7.55, 2.7.8.25, 2.8.3.7, 3.1.1.21, 3.1.2.15, 3.1.2.26, 3.1.27.1, 3.1.27.2, 3.1.27.4, 3.1.27.5, 3.1.27.6, 3.1.27.9, 3.1.3.13, 3.2.1.110, 3.3.2.5, 3.4.13.3, 3.4.21.87, 3.5.1.27, 3.5.3.19, 3.5.4.14, 3.5.99.3, 3.5.99.4, 3.6.1.19, 3.6.1.30, 3.6.1.47, 3.6.1.48, 3.6.3.10, 3.6.3.14, 3.6.3.15, 3.6.3.16, 3.6.3.17, 3.6.3.25, 3.6.3.31, 3.6.3.47, 3.6.3.48, 3.6.3.49, 3.6.3.50, 3.6.3.51, 3.6.3.52, 3.6.4.1, 3.6.4.11, 3.6.4.2, 3.6.4.3, 3.6.4.4, 3.6.4.5, 3.6.4.8, 3.6.4.9, 4.1.1.41, 4.1.1.70, 4.1.2.30, 4.1.2.37, 4.2.1.4, 4.2.1.52, 4.2.1.58, 4.2.1.60, 4.2.1.61, 4.2.1.89, 4.2.3.14, 4.3.1.11, 5.2.1.3, 5.3.3.15, 5.4.1.2, 5.4.2.1, 5.99.1.2, 6.1.1.25, 6.2.1.29, 6.3.2.15, 6.3.2.21, 6.3.2.27, 6.3.2.28
I also find 236 unique obsolete EC numbers in 251 xref statements:
1.10.2.2, 1.10.9.1, 1.10.99.2, 1.10.99.3, 1.1.1.128, 1.1.1.158, 1.1.1.161, 1.1.1.246, 1.13.11.13, 1.13.12.12, 1.1.3.3, 1.1.4.1, 1.14.11.14, 1.14.11.19, 1.14.11.22, 1.14.11.23, 1.14.12.4, 1.14.12.5, 1.14.13.100, 1.14.13.104, 1.14.13.108, 1.14.13.109, 1.14.13.11, 1.14.13.112, 1.14.13.117, 1.14.13.118, 1.14.13.12, 1.14.13.13, 1.14.13.132, 1.14.13.134, 1.14.13.136, 1.14.13.137, 1.14.13.138, 1.14.13.139, 1.14.13.140, 1.14.13.141, 1.14.13.142, 1.14.13.143, 1.14.13.144, 1.14.13.145, 1.14.13.15, 1.14.13.157, 1.14.13.17, 1.14.13.173, 1.14.13.192, 1.14.13.193, 1.14.13.21, 1.14.13.26, 1.14.13.28, 1.14.13.36, 1.14.13.37, 1.14.13.41, 1.14.13.47, 1.14.13.48, 1.14.13.49, 1.14.13.52, 1.14.13.53, 1.14.13.55, 1.14.13.56, 1.14.13.57, 1.14.13.60, 1.14.13.67, 1.14.13.68, 1.14.13.70, 1.14.13.71, 1.14.13.72, 1.14.13.73, 1.14.13.74, 1.14.13.75, 1.14.13.76, 1.14.13.77, 1.14.13.78, 1.14.13.80, 1.14.13.85, 1.14.13.86, 1.14.13.87, 1.14.13.88, 1.14.13.89, 1.14.13.90, 1.14.13.91, 1.14.13.93, 1.14.13.94, 1.14.13.95, 1.14.13.96, 1.14.13.97, 1.14.13.98, 1.14.13.99, 1.14.15.2, 1.1.4.2, 1.14.20.2, 1.14.21.1, 1.14.21.2, 1.14.21.3, 1.14.21.4, 1.14.21.5, 1.14.21.6, 1.14.99.10, 1.14.99.27, 1.14.99.28, 1.14.99.3, 1.14.99.30, 1.14.99.31, 1.14.99.32, 1.14.99.33, 1.14.99.36, 1.14.99.9, 1.1.5.6, 1.17.1.2, 1.17.99.1, 1.17.99.5, 1.1.99.10, 1.1.99.8, 1.2.1.2, 1.2.1.40, 1.2.1.43, 1.2.1.66, 1.2.2.3, 1.2.3.11, 1.2.7.2, 1.2.99.2, 1.2.99.4, 1.2.99.5, 1.3.1.30, 1.3.1.35, 1.3.1.4, 1.3.1.52, 1.3.1.59, 1.3.1.63, 1.3.3.1, 1.3.3.9, 1.3.99.1, 1.3.99.10, 1.3.99.13, 1.3.99.21, 1.3.99.22, 1.3.99.3, 1.3.99.7, 1.4.99.1, 1.4.99.3, 1.5.1.12, 1.5.3.11, 1.5.99.11, 1.5.99.8, 1.5.99.9, 1.6.6.9, 1.6.99.5, 1.7.3.4, 1.7.99.8, 1.8.99.1, 1.8.99.3, 1.97.1.10, 1.97.1.11, 1.97.1.3, 1.97.1.8, 1.9.99.1, 2.1.1.124, 2.1.1.125, 2.1.1.149, 2.1.1.51, 2.1.1.52, 2.1.1.66, 2.3.1.104, 2.3.1.119, 2.3.1.128, 2.3.1.154, 2.3.1.70, 2.3.1.88, 2.3.1.96, 2.4.1.119, 2.4.1.130, 2.4.1.157, 2.4.1.163, 2.4.1.45, 2.4.1.57, 2.4.1.95, 2.4.2.23, 2.4.99.11, 2.5.1.77, 2.6.1.68, 2.7.1.69, 2.7.7.21, 2.7.7.25, 2.7.7.54, 2.7.7.55, 2.7.8.25, 2.8.3.7, 3.1.2.15, 3.1.2.26, 3.1.27.1, 3.1.27.2, 3.1.27.4, 3.1.27.5, 3.1.27.6, 3.1.27.9, 3.1.3.13, 3.3.2.5, 3.4.13.3, 3.4.21.87, 3.5.1.27, 3.5.3.19, 3.5.4.14, 3.5.99.3, 3.5.99.4, 3.6.1.19, 3.6.1.30, 3.6.3.10, 3.6.3.16, 3.6.3.17, 3.6.3.25, 3.6.3.47, 3.6.3.49, 3.6.3.50, 3.6.3.51, 3.6.3.52, 3.6.4.3, 3.6.4.4, 3.6.4.5, 4.1.1.41, 4.1.1.70, 4.1.2.30, 4.1.2.37, 4.2.1.4, 4.2.1.58, 4.2.1.60, 4.2.1.61, 4.2.1.89, 4.2.3.14, 4.3.1.11, 5.3.3.15, 5.4.1.2, 5.4.2.1, 5.99.1.2, 5.99.1.3, 6.1.1.25, 6.3.2.27, 6.3.2.28
One way to fix some of the xrefs would be to take the current EC number from Rhea, which is apparently not done, e.g.:
id: GO:0047087
name: protopine 6-monooxygenase activity
def: "Catalysis of the reaction: H(+) + NADPH + O(2) + protopine = 6-hydroxyprotopine + H(2)O + NADP(+)." [EC:1.14.13.55, RHEA:22644]
synonym: "protopine 6-hydroxylase activity" EXACT []
synonym: "protopine,NADPH:oxygen oxidoreductase (6-hydroxylating)" EXACT [EC:1.14.13.55]
xref: EC:1.14.13.55
xref: KEGG_REACTION:R04699
xref: MetaCyc:1.14.13.55-RXN
xref: RHEA:22644
..but https://www.rhea-db.org/reaction?id=22644 has EC 1.14.14.98.
Not every EC is referred to by Rhea. But, this has probably accumulated over time. It could be nice to have a monthly report of EC changes like this so that we aren't hit with thousands to fix at once! I can start on the xref usage (and hopefully hit the def ref while I'm there).
Note these are both Transferred and Removed entries so part of these have changed not disappeared. Also, if you just remove or change the xref lines I can automate that and do a pull request. No need to do this manually.
If any are split I need to look at the split. And eventually we DO plan to cull RHEA for ECs, but this is not implemented yet. part of my doing this manually is that I also see if there is a RHEA and if so add it.
Started; checked out 50 for rhea; replaced about 25 today.
I just see I picked many of deleted ECs from obsoleted GO entries, sorry. The actual number that are not in obsoleted entries is only a few dozens (of deleted ECs).
no problem. working it through.
up to 90 replaced thus far.
I have done all of the long-hanging (simple replacement) ones including RHEA ids for ones that didn't have one to begin with. Now working on more complicated mappings.
Thanks for your work!
Added a few more fixes
I was informed today that apparently we have fixed all with obsoleted ECs. (based on a file from Expasy. I'll close this ticket. If we find others, I would prefer a new ticket to be made.
I still find may of those ECs in the ontology.
I don't remember who told me these were all fixed; SO, what I want is the CURRENT list; not the old one. I don't so scripts. @balhoff , can you do something like this?
Note that if the corrections were to be done computationally, attention would need to focus on the parentage of the GO term. If one of the parents were to one of our grouping terms mapping to a 3 digit ec, the parent would also need to be changed.
I had a first look at this. If I'm not mistaken, since the start of this ticket there were further 34 EC entries obsoleted. Of them the following 19 of them appear in a recent go.obo:
[ ] 1.2.2.4 deleted entry; rhea and Metacyc obsoleted also;
replace MetaCyc:RXN-21452 Relace EC:1.2.5.3; ; RHEA:48880
[ ]
[ ] 1.3.7.9 transferred to 1.1.7.1; RHEA good; parent is now 1.1.7.-; parent fine
[ ] 1.6.99.3 deleted entry; NOT in the ontology.
[ ] 1.9.3.1 only maps to obsolete terms; currently replaced with EC:7.1.1.9 used for cytochrome c oxidase
[ ] 1.11.1.15 transferred to 1.11.1.24 mapping to RHEA:62620; current term has RHEA:10008; ; both specify cysteine
hold for further analysis
[ ] 1.14.99.19 transferred to 1.14.19.77,, RHEA:22956 (same mapped to go term)
[ ] 1.14.99.37 transferred to 1.14.14.176; RHEA also transfered
[ ] 1.16.1.3 GO term obsoletd
[ ] 1.16.1.5 GO term obsoleted
[ ] 2.1.1.43 transferred to 2.1.1.354 (two go terms have this mapping)] [Histone H3]-lysine(4) N-trimethyltransferase. RHEA:60260 maps to this ec. but rhea : | L-lysyl4-[histone H3] + 3 S-adenosyl-L-methionine = 3 H+ + N6,N6,N6-trimethyl-L-lysyl4-[histone H3] + 3 S-adenosyl-L-homocystein; 6 GO terms use histone h3; No rhea found for any of these go terms; none added.
[ ] 3.1.3.31 EC deleted; One GO term; has RHEA; fits reaction; removing ec only.
[ ] 3.1.12.2 transferred to 3.6.1.72, mapping to RHEA:52140; just change ec rhea ok
[ ] 3.1.27.3 transferred to 4.6.1.24; no rhea mapping; changed parent to ribonuclease activiy
[ ] 3.1.27.10 transferred to 4.6.1.23; changed parent to ribonuclease activiy
[ ]
[ ] 3.2.1.44 transferred to 3.2.1.211 ; updated metacy RXN-20949 ; No RHEA; added to refs supplied by metacyc
[ ] 3.6.3.11 Cannot find in ontology; deleted entry in ec; nothing to do here
[ ] 3.8.1.1 deleted in ec; GO:0047651; has rhea; removed ec
[ ] 4.1.2.41 transferred to ECL4.1.2.61
[ ] 5.1.3.12 deleted in EC; deleted in RHEA; has Reactome. GO:0050379
Thanks @rwst . What we need is a monthly check on the status of our xrefs, especially EC and RHEA; so that we are faced with a lot at once. Of course, its dependent upon how often the public targets for the xrefs get changed.
I'll play with the new set later today in my mechanic's waiting room.
Out of date - created new tickets from a new query:
There were 2 more deleted EC in the dbxrefs: https://github.com/geneontology/go-ontology/issues/24923 and https://github.com/geneontology/go-ontology/issues/24922 There are about 50 transferred EC: https://github.com/geneontology/go-ontology/issues/24927
Obsoletion in enzyme.dat has the form
or
Travis should grep the OBO file for usage of "EC:x.y.z.w" with the numbers coming from a list.