geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
45 stars 89 forks source link

ORCID in assigned_by field? #428

Open tberardini opened 7 years ago

tberardini commented 7 years ago

What do you think about allowing ORCIDs in the Assigned_by field?

http://geneontology.org/page/go-annotation-file-gaf-format-21

kltm commented 7 years ago

Off the top of my head, technically, any identifier should be fine in there; practically, we've been using convention and "namespaces" as id-labels in that field and a switch would do things like make AmiGO less usable until mapping could be brought in and threaded through. @cmungall

kltm commented 7 years ago

So the idea here would be to change the semantics of c15 from the "database" that made the annotation to the person who did it? For example, if a person moves between orgs, how would their affiliation be tracked through time for remapping? For our new annotations, the Noctua/GO-CAM models cover this, but in some cases there will be loss of information when converting back to legacy GAFs. I think the way the wind is blowing now is generally to stop trying to jimmy ever more information into the GAFs and try and get people to have acceptable peelings of information from the OWL models.

tberardini commented 7 years ago

I was thinking more for cases where there are non-database curators submitting GO annotations. For example, community submissions through PomBase or TAIR. I'm not suggesting restricting c15 values to ONLY ORCIDs but allowing ORCIDs or the current database information to be used here.

pgaudet commented 7 years ago

@tberardini : that's a great idea to encourage and recognize users' contributions to GO.

tonysawfordebi commented 7 years ago

Is now a good time to (re-)propose the idea of a "Group:Project" structure for the assigned_by column...?

kltm commented 7 years ago

The problem I see is that we are spending time tweaking a legacy format, with all of the advertising and parsing issues that will come with it.

Envision a future where all annotations live as GO-CAMs and GAFs are created on the fly and to spec from the triple store. People who want to support their flavor of parrot can have complete control over the output--column 22 can be exactly what you want!

I think we are pretty much at this point now and that resources might be better spent on trying to sunset GAF by working on the new pipeline and enabling bespoke downloads.

Bat signal for @cmungall .

tonysawfordebi commented 7 years ago

I have to confess that my comment was a bit tongue-in-cheek...

kltm commented 7 years ago

@tonysawfordebi Of course--not that I read it otherwise. I'm just a worrywart in general about the transition to the post-GAF world and carry my little soapbox around with me.

cmungall commented 7 years ago

I agree, this is changing the semantics of an existing column. The solution is not to use GAF

thomaspd commented 7 years ago

+1. Yet another reason for transitioning to Noctua, where this can be captured.

From: Chris Mungall notifications@github.com<mailto:notifications@github.com> Reply-To: geneontology/go-site reply@reply.github.com<mailto:reply@reply.github.com> Date: Fri, 15 Sep 2017 09:58:24 +0000 To: geneontology/go-site go-site@noreply.github.com<mailto:go-site@noreply.github.com> Cc: Subscribed subscribed@noreply.github.com<mailto:subscribed@noreply.github.com> Subject: Re: [geneontology/go-site] ORCID in assigned_by field? (#428) Resent-From: Paul Thomas pdthomas@usc.edu<mailto:pdthomas@usc.edu> Resent-Date: Fri, 15 Sep 2017 02:58:28 -0700

I agree, this is changing the semantics of an existing column. The solution is not to use GAF

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_geneontology_go-2Dsite_issues_428-23issuecomment-2D329738190&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=33hhNKh7WpvJRiaXHf366Q&m=SSDsKgwTh1Vi17kin8mfpTuw-h4BvRbPGjyZ_gmylso&s=t5KpeO06EonZ3zxagURRVmFqcMciQ1sfQcnQNLrgNsg&e=, or mute the threadhttps://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_ALaXeGWG5P-2DsEUnCuE5mQeVfjUrZ9tS5ks5sikpAgaJpZM4PW7BT&d=DwMCaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=33hhNKh7WpvJRiaXHf366Q&m=SSDsKgwTh1Vi17kin8mfpTuw-h4BvRbPGjyZ_gmylso&s=c6QhiBlzEMGEXgYIKlslVIdk9Bl38LmdtImshkRdj0I&e=.

tberardini commented 7 years ago

The solution is not to use GAF

What file format could be used to submit annotations for integration into the GO database/triple store that would allow ORCIDs in the assigned by field? We publicly acknowledge the community members who submit GO annotations to TAIR on the TAIR site but that's all hidden when we send the data to GO (converted to TAIR assigned_by). I don't think that Noctua will be the only platform for generating GO annotations in the future and perhaps there should be room for other annotation platforms and a mechanism for integration of annotations from these other platforms into the GO.

kltm commented 7 years ago

I agree that Noctua the application is not going to be what people use to create community annotation in all cases, although what cases should be covered are TBD.

The question at hand is a file format for communication between the "community" and GO. While not a perfect example, David OS submits SynGO work directly as OWL models that we then incorporate directly. These methods mesh perfectly with the new data model and pipeline, and also gives them complete control over all aspects for the annotation metadata is handled.

What's great about the GAFs is that they are dead simple and easy to generate. Maybe we should look at a tool that can convert GAFs, plus additional metadata, into submit-able OWL files? I think what I (and some others) want to avoid is further extending a data model that is insufficient and introducing more parsing errors, version issues, and downstream effects into a format that we're actively trying to bring to a close.

cmungall commented 7 years ago

I'm betting a lot of these external submissions are managed in excel files, forcing people to use OWL may be too much. But GPAD should be easy (and fully captures provenance for simple annotations)

tberardini commented 7 years ago

I'm betting a lot of these external submissions are managed in excel files, forcing people to use OWL may be too much.

:) Maybe a wee bit too much.