geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Decide on fate of "unmanaged" external2go files #16989

Open kltm opened 5 years ago

kltm commented 5 years ago

The following external2go files have been deposited into SVN, but currently have no known upstream fiile or manager:

We would like to either take over the management of the file (in which case it could possibly be migrated into the ontology), defer management of the file to an upstream source outside of SVN, or abandon unused/unloved mapping files.

kltm commented 5 years ago

Noting some previous work at https://github.com/geneontology/go-annotation/issues/2056 from @vanaukenk :

The QA/QC group will be reviewing the external2go mappings that exist here:

http://viewvc.geneontology.org/viewvc/GO-SVN/trunk/external2go/

to determine:

Are the mappings still being used for annotation?

Who is responsible for maintaining the mappings?

Who is responsible for applying the mappings?
cmungall commented 5 years ago

[not to keep moving stuff around but I'd say this falls within the remit of the ontology group]

@thomaspd will reach out to NCBI about COG. We need to find out if they maintain mappings. If not I think we have no option but to take on the mainentance as IMO these are important.

You can assign unipathway to me. I have already 'rescued' unipathway into github: https://github.com/geneontology/unipathway and will be coming up with a strategy for all the xrefs

Some of these may be subsumed into interpro2go, Paul can tell us - however, even if subsumed I bet there are still many communities using things like tigrfams directly, maybe needs to be a joint outreach effort here..

kltm commented 5 years ago

@ukemi I'm not sure what project this would fall into.

pgaudet commented 5 years ago

Added to tomorrow's managers call.

cmungall commented 5 years ago

What was the result of the discussion?

I suggest for now we rescue these by direct add into github and copy out as part of the release. We can then address separately issues of ongoing maintenance

pgaudet commented 5 years ago

It's been on the managers agenda for a few weeks - hopefully tomorrow we can get to it.

Pascale

cmungall commented 5 years ago

metadata for each of these:

==> cog2go <==
!version: $Revision: 1.4 $
!date: $Date: 2010/01/27 15:37:22 $
!Mapping of COG to GO terms.
!Michael Ashburner & Jane Lomax
!Uses: http://www.ncbi.nlm.nih.gov/cgi-bin/COG/palox?fun=all
!Uses: http://www.ncbi.nlm.nih.gov/cgi-bin/COG/palox?sys=all
!Downloaded 20031103
!
COG:Information storage and processsing > GO:.
COG:J Translation, ribosomal structure and biogenesis > GO:translation ; GO:0043037

==> egad2go <==
!version: $Revision: 1.13 $
!date: $Date: 2010/01/27 15:37:22 $
!Mapping of TIGR EGAD to GO.
!Michael Ashburner, Cambridge & Michelle Gwynn Giglio, TIGR.
!The EGAD database is no longer in use at TIGR.  No further developement or 
!updates will be performed on EGAD.
! note added 2009-06-22: ID prefix changed to JCVI_EGAD to match GO.xrf_abbs entry
!
JCVI_EGAD:cell division > GO:cytokinesis ; GO:0000910
JCVI_EGAD:DNA synthesis/replication > GO:DNA replication ; GO:0006260

==> genprotec2go <==
!version: $Revision: 1.9 $
!date: $Date: 2010/01/27 15:37:22 $
!Mapping of GenProtEC to GO terms.
!Michael Ashburner & Heather Butler, Cambridge.
!Uses:Functional classification scheme for E.coli - Monica Riley and Gretta Serres, Sept 26 2000.
!See http://genprotec.mbl.edu.
!
GenProtEC:1 Metabolism > GO:metabolism ; GO:0008152
GenProtEC:1.1 carbon utilization (feed to mainstream) > GO:carbon utilization ; GO:0015976
GenProtEC:1.1.1 carbon compounds > GO:.

==> mips2go <==
!version: $Revision: 1.12 $
!date: $Date: 2010/01/27 15:37:22 $
!
!Mapping of MIPS Functional Catalogue to GO.
!Michael Ashburner, Cambridge; updated 20-21 Aug 2002, Midori Harris, EBI; major revision 10-01-2006, Jane Lomax, EBI.
!Uses:ftp://ftpmips.gsf.de/catalogue/funcat-2.0_scheme
!Reference: Nucleic Acids Res. 2004 Oct 14;32(18):5539-45. 
!Version: Functional Classification Catalogue, Version 2.0, 18.03.2004
!We have not corrected inconsistences in Funcat, e.g. British vs. US spelling; amino acid vs amino-acid.
!

==> multifun2go <==
!From MultiFun site 2003-09-29
!typos etc in MultiFun corrected
!Created by Michael Ashburner & Jane Lomax September 29 2003, updated by Jane Lomax December 19 2005
!version 1.3
!
MultiFun:1 Metabolism > GO:metabolism ; GO:0008152
MultiFun:1.1 Carbon compound utilization > GO:.
MultiFun:1.1.1 Carbohydrates/Carbon compounds > GO:carbohydrate catabolism ; GO:0016052
MultiFun:1.1.1.1 D-allose catabolism > GO:D-allose catabolism ; GO:0019316
MultiFun:1.1.1.2 2,5-ketogluconate metabolism > GO:ketogluconate metabolism ; GO:0019522

==> rfam2go <==
Rfam:RF00001 5S_rRNA > GO:structural constituent of ribosome ; GO:0003735
Rfam:RF00001 5S_rRNA > GO:ribosome ; GO:0005840
Rfam:RF00002 5_8S_rRNA > GO:structural constituent of ribosome ; GO:0003735
Rfam:RF00002 5_8S_rRNA > GO:ribosome ; GO:0005840
Rfam:RF00003 U1 > GO:mRNA 5'-splice site recognition ; GO:0000395
Rfam:RF00003 U1 > GO:U1 snRNP ; GO:0005685
Rfam:RF00003 U1 > GO:pre-mRNA 5'-splice site binding ; GO:0030627
Rfam:RF00004 U2 > GO:mRNA branch site recognition ; GO:0000348
Rfam:RF00004 U2 > GO:U2 snRNP ; GO:0005686
Rfam:RF00004 U2 > GO:pre-mRNA branch point binding ; GO:0045131

==> tigr2go <==
!version: $Revision: 1.12 $
!date: $Date: 2010/01/27 15:37:22 $
!
!Mapping of TIGR roles to GO.
!Michael Ashburner, Cambridge, Leonore Reiser, TAIR & Michelle Gwynn Giglio, TIGR.
!Uses:TIGR Role list from Michelle Gwinn, TIGR, September 2000.
!
TIGR_role:11010 70  Amino acid biosynthesis Aromatic amino acid family > GO:aromatic amino acid family biosynthesis ; GO:0009073
TIGR_role:11020 71  Amino acid biosynthesis Aspartate family > GO:aspartate family amino acid biosynthesis ; GO:0009067
TIGR_role:11030 73  Amino acid biosynthesis Glutamate family > GO:glutamine family amino acid biosynthesis ; GO:0009084

==> tigrfams2go <==
!version: $Revision: 1.23 $ 
!date: 2008/08/11
!GO associations for TIGRFAMs
!by JCVI annotation and HMM teams
!
!Note that mappings included to some TIGRFAM accessions include both eukaryotic specific terms and prokaryotic specific terms, care must be taken to choose the correct term for your particular organism in these cases
!
JCVI_TIGRFAMS:TIGR00001 ribosomal protein L35 > GO:translation ; GO:0006412
JCVI_TIGRFAMS:TIGR00001 ribosomal protein L35 > GO:cytosolic large ribosomal subunit ; GO:0022625
JCVI_TIGRFAMS:TIGR00001 ribosomal protein L35 > GO:organellar large ribosomal subunit ; GO:0000315

==> um-bbd_reactionid2go <==
! version: $Revision: 1.263 $
! date: $Date: 2012/06/01 00:25:54 $
!
! Generated from file ontology/editors/gene_ontology_write.obo,
! CVS revision:  1.3145; date:  31:05:2012 22:04
!
! Mapping of Gene Ontology terms to UM-BBD reaction IDs
! UM-BBD (The University of Minnesota Biocatalysis/Biodegradation Database): http://umbbd.msi.umn.edu/
! Last update at Thu May 31 17:22:53 2012 by the script /users/cjm/cvs/go-moose/bin/daily_from_obo.pl
!

==> unipathway2go <==
!version date: 2016/04/23 05:35:12
!description: Mapping of UniPathway pathway identifiers to GO terms.
!external resource: http://www.unipathway.org; http://www.unipathway.org/download/unipathway/public/unipathway2go.tsv
!citation: Morgat A, Coissac E, Coudert E, Axelsen KB, Keller G, Bairoch A, Bridge A, Bougueleret L, Xenarios I, Viari A.(2012) Nucleic Acids Res. D761-9.  PMID: 22102589.
!contact: goa@ebi.ac.uk; anne.morgat@isb-sib.ch
!
UniPathway:UPA00001 biological process > GO:biological_process ; GO:0008150
UniPathway:UPA00002 2-deoxy-D-ribose 1-phosphate degradation > GO:deoxyribose phosphate catabolic process ; GO:0046386
UniPathway:UPA00003 uridine metabolism > GO:uridine metabolic process ; GO:0046108
UniPathway:UPA00004 thymidine metabolism > GO:thymidine metabolic process ; GO:0046104
pgaudet commented 5 years ago

@keseler Is MultiFun still being used ? The most recent reference I can find about it is https://ecocyc.org/EcoCycUserGuide.shtml

Thanks, Pascale

keseler commented 5 years ago

As you know, MultiFun itself is no longer updated; Monica Riley died a few years ago, and Gretta Serres is not at MBL any more. If you'd like me to, I could get in touch with her to see if she knows anything. As far as usage goes, I still assign the occasional MultiFun term within EcoCyc, because we display them in an easy-to-understand hierarchy (unlike the way we display GO annotations), but we definitely don't use the mapping to generate GO terms. Also, the MultiFun ontology within EcoCyc isn't even the last version. I do not know of any other groups who are currently using MultiFun. I do have a vague memory of someone using it, but don't remember who that might have been. Wait, now that I think of it, the Pasteur/Genoscope people still use an ontology that resembles or may be based on MultiFun. E.g. in the most recent GenBank file for B. subtilis, AL009126, you'll see /function="16.9: Replicate" for DnaA.

cmungall commented 5 years ago

Given this, I recommend slurping the xrefs directly into the editors version of the ontology. This would entail no additional maintenance for the ontology editors, but it would ensure that after any merges (likely rare) the xref migrates to the correct term.

pgaudet commented 5 years ago

@cmungall Who will do this ? ie can we do this automatically ? (if not I can do it).

Thanks, Pascale

cmungall commented 5 years ago

I'll do it

On Wed, Apr 24, 2019 at 9:21 AM pgaudet notifications@github.com wrote:

@cmungall https://github.com/cmungall Who will do this ? ie can we do this automatically ? (if not I can do it).

Thanks, Pascale

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/16989#issuecomment-486313116, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOILJYBMIPP45MG3B43PSCCA5ANCNFSM4G3UGFOA .

suzialeksander commented 5 years ago

@cmungall @vanaukenk @thomaspd We just received this through the Helpdesk. If you think this will help with the cog2go mapping, or need anything else from this user, I asked him to keep an eye on this ticket or we have his contact info in the Helpdesk emails.

Dear Contributors to GO

I have made a table mapping from COG entries to GO terms, which could be a useful addition in your http://current.geneontology.org/ontology/external2go/ collection. It was made through a combination of the idmapping files from UniProtKB

It is available here: http://mibi.galaxy.bio.ku.dk/russel/mappings/cog2go

Use it if you like.

Best regards, Jakob Russel

cmungall commented 5 years ago

It would be good to spot check, it may just be the same as the one we have

On Fri, May 10, 2019 at 4:39 PM suzialeksander notifications@github.com wrote:

@cmungall https://github.com/cmungall @vanaukenk https://github.com/vanaukenk @thomaspd https://github.com/thomaspd We just received this through the Helpdesk. If you think this will help with the cog2go mapping, or need anything else from this user, I asked him to keep an eye on this ticket or we have his contact info in the Helpdesk emails.

Dear Contributors to GO

I have made a table mapping from COG entries to GO terms, which could be a useful addition in your http://current.geneontology.org/ontology/external2go/ collection. It was made through a combination of the idmapping files from UniProtKB

It is available here: http://mibi.galaxy.bio.ku.dk/russel/mappings/cog2go

Use it if you like.

Best regards, Jakob Russel

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/16989#issuecomment-491458033, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOONXRWKC5O3ZVJ7KP3PUYBRXANCNFSM4G3UGFOA .

pgaudet commented 5 years ago

@cmungall Who could do the spot check ? Please assign someone.

cmungall commented 5 years ago

out this week sorry

pgaudet commented 5 years ago

emailed um-bbd

suzialeksander commented 5 years ago

@cmungall We've found https://github.com/cmungall/goimport-external2go/blob/master/cog2go but it's 7 years old, should we link this file/repo to the GO site's broken COG2GO link (https://github.com/geneontology/helpdesk/issues/203)

cmungall commented 5 years ago

That repo is from an early attempt to migrate our SVN to GitHub, it's not relevant here, and the content is the same as in svn.

COG was update in 2013 but it looks like the 26 function groups remain the same: ftp://ftp.ncbi.nih.gov/pub/COG/COG2014/data

Let's just preserve these as xrefs maintained in go-edit

cmungall commented 5 years ago

OK, there are two sets of mappings for COG. The existing ones we have are the 26 COG functional groups. The one from Jabok is to COG names. This is much more interesting and useful. We should incorporate these and credit Jakob.

pgaudet commented 5 years ago

For reference: Jakob's mappings are: http://mibi.galaxy.bio.ku.dk/russel/mappings/cog2go

pgaudet commented 5 years ago

@cmungall What would be the plan ? To insert this in the ontology? Or just have the file copied on the GO website somewhere ? (should we discuss this at on ontology call?) Who would maintain the file?

cmungall commented 5 years ago

As COG is fairly stable I would just store it in the edit file

On Wed, Jun 12, 2019 at 2:45 PM pgaudet notifications@github.com wrote:

@cmungall https://github.com/cmungall What would be the plan ? To insert this in the ontology? Or just have the file copied on the GO website somewhere ? (should we discuss this at on ontology call?)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/16989?email_source=notifications&email_token=AAAMMOIYH7XKG2V4TXGEBTTP2FU67A5CNFSM4G3UGFOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXR4MIQ#issuecomment-501466658, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAMMOP6LTHKKFIWCUD5ET3P2FU67ANCNFSM4G3UGFOA .

pgaudet commented 5 years ago

@cmungall have you been able to download the full file ? I downloaded a file that seems partial (because the last line, COG:COG0787, has no mapping). The file is >600K lines long; with lots of duplicates - when I remove duplicates I get 35,000 lines.

That seems like a lot ?? We should have a look at the mappings before integrating it.

Pascale

Russel88 commented 5 years ago

Dear all,

I am the contributor of the cog2go mapping file. @pgaudet I forgot to remove duplicates. I have updated the file now and it's much smaller. I also included a md5 checksum.

Cheers, Jakob

pgaudet commented 5 years ago

@happy-lorna @alexsign Are pfam2go, pirsf2go, prints2go, prodom2go, prosite2go also included in InterPro2GO ? or should we assume that some people consume those separately ?

Thanks, Pascale

alexsign commented 4 years ago

@pgaudet if it's files GAO producing then yes, pfam2go, pirsf2go, prints2go, prodom2go, prosite2go is part of interpro2go

ValWood commented 1 year ago

pinging for jamboree

pgaudet commented 1 year ago

Thanks!

We shoud discuss whether these are needed:

I think others are OK

ValWood commented 1 year ago

Presumably nobody maintains these? and InterPRO to GO will cover most possible mappings (there may be a little bit of specificity lost for sub-families, but these should be picked up by PAINT (and maybe ARBA eventually)

cmungall commented 1 year ago

These are transitive mappings, so they are not maintained directly. Provided the source mappings to interpro is maintained, the transitive mapping is easily obtained by a join so I think we should continue to provide them as a convenience