geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
223 stars 40 forks source link

Adding missing enzymes from EC to GO #13432

Open hdrabkin opened 7 years ago

hdrabkin commented 7 years ago

Ron Caspi (ron.caspi@sri.com) has done an analysis and finds 2188 EC terms are missing from the EC2GO mapping file. This number includes some classes and subclasses, e.g. EC-5.99. There are 2111 complete numbers describing specific enzymes. I have attached two excel files with data, one with classes, the other with only enzymes (4 digit ec) Complete definitions of the EC numbers can be found at enzyme-database.org http://www.enzyme-database.org/query.php?ec=2.8.1.12

That website also has the definitions of the classes and subclasses - you can find them manually under the "Enzymes by Class" tab, or by using urls in the format http://www.enzyme-database.org/cinfo.php?c=X&sc=Y&ssc=Z (c=class, sc=subclass, ssc = subsubclass). For example http://www.enzyme-database.org/cinfo.php?c=4&sc=5 This will be a large project. EC numbers not in GO with classes.xlsx EC numbers not in GO no classes.xlsx

hdrabkin commented 7 years ago

I recently contacted Ron ; I’m starting to look at what might help me get some of the missing EC numbers into GO. I fetched files from the ftp site (enzyme.get, enzyme.dat, enzuser.txt, and enzclass.txt); enzyme.dat appears to have most of the needed info except for references Chris might have made a script that used KEGG

pgaudet commented 4 years ago

Are we still planning to do this ?

hdrabkin commented 4 years ago

This is ongoing as some are added during other tickets. And we do plan that once we are all set with RHEA we can fetch the ECs for reactions from RHEA.

pgaudet commented 3 years ago

If I understand correctly, this is covered by other tickets. Can we close?

hdrabkin commented 3 years ago

Not really covered in other tickets. This is a different approach. I haven't been adding new terms from this list directly. It's an old list now. What would be useful is to get a recent ist of 4 digit ecs that are NOT in GO. But to use them to add, I would either need a rhea to use as a definition ref, and/or a PMID

hdrabkin commented 3 years ago

Do we want to add a reaction unless a specific request for it comes in?

deustp01 commented 3 years ago

Classic answer - no; wait for term requests and then create as needed. But it's probably time to re-visit this in co-ordination with Rhea and, here, Ron Caspi. I get the impression that Rhea's preferred practice may be to take on a domain of chemistry and handle it exhaustively (but Alan Bridge would need to speak to this) and if that fits with MetaCyc curation plans that are fairly concrete, then maybe the requested bulk term creation could make sense? That would also make it much easier to ensure that all the new terms are consistent and consistently grouped - easier than for piecemeal term creation? @ukemi ?

hdrabkin commented 3 years ago

I just imagine the amount of time going down a list of 4 digit ECs not in GO. It could be done bulk computationally if the EC already had a single rhea OR at least a PMID.

ukemi commented 3 years ago

My two cents is that we need to at least coordinate with Rhea based on our recent strategies to generate and exhaustive 1:1 mapping.

pgaudet commented 3 years ago

We'll revisit this when the Rhea first load is done.

danielhhaft commented 3 years ago

Greetings. NCBI's prokaryotic genome pipeline group has been adding GO terms to annotation rules used by the PGAP and RefSeq pipelines, and fairly soon will let those annotation rules attach GO terms to a substantial fraction of the ~200,000,000 different prokaryotic proteins in our collection.

We are interested in obtaining the latest EC2GO mapping to assist us in this. If the list is less up to date than it could be, we would like to join the set of voices asking for a more complete EC2GO.

pgaudet commented 3 years ago

Hi @danielhhaft

The most recent EC2GO and RHEA to GO mappings can be found at this address: http://current.geneontology.org/ontology/external2go/index.html

They are updated approximately monthly, at each GO release.

Thanks, Pascale

ValWood commented 1 month ago

@pgaudet is this out of date?