Open krchristie opened 7 years ago
@cmungall any thoughts on coordinating this?
I was trying to annotate a human isoform, that is conserved across many mammals including mouse and hamster, and for which UniProt has an ID, specifically Q9Y5P4-2. However, the only autocomplete option I get is for the base ID (Q9Y5P4). I tried out using a PR: prefix, but that isn't allowed either. I am allowed to put this human UniProt isoform ID into Protein2GO, so it seems like there should be a way to do it here.
relevant model: gomodel:593423e000001008
We get human entities from
https://github.com/geneontology/go-site/blob/master/metadata/datasets/goa.yaml#L459-L472
Currently the procedure is to request the set of IDs you'd like to annotate from the authority for that species, in this file it's goa@ebi.ac.uk -- @tonysawfordebi
I brought up this issue at the meeting and attempted to have a discussion about whether we should have a common SOP across GO for what is included and not included, PRO vs UniProtKB IDs, we decided to set up a working group for this.
As a last resort you should be able to paste in a CURIE (prefixed ID) like UniProtKB:Q9Y5P4-2 to inject it directly into the model, @kltm can confirm
Direct ID insertion for GPs is currently only allowed for the annoton entry sub-panel (not the individual one or any of the others). This was a negotiated balance to work with UniProt.
That's pretty unintuitive that it's only allowed in one sub-panel, and also which prefix works...
Also, it would be nice if the graph view would convert the UniProt isoform ID into some more human friendly name.
The exception is known to be inconsistent, it was negotiated to provide a temporary hack. As the system does not know about the entity that was inputted, no label can be provided.
As I said, "as a last resort". Really you should be using IDs sanctioned by the GPI provider.
Maybe I'll just do this in Protein2GO, where it is allowed to use the UniProt isoform ID. I am accumulating too many Noctua models with issues that require I remember to check them later...
@krchristie Can you check to see if the isoform identifers are in the GPI file which is referenced above? If they are there, there must be an issue loading them into Noctua. If they are not there, we should try to figure out why they are missing. Since they are not modified forms, it seems to me that they should be in that file.
I examined two gpi files from GOA, goa_human.gpi.gz and goa_human.isoform.gpi.gz Neither file appears to have human isoforms (the Q9Y5P4-1,-2, etc. of a UniProt id (Q9Y5P4)). Thus these are not available in Noctua.
As far as allele names: I don't thing allele ids are in the gpi (just grepped for a few examples); makes sense since the alleles would not be something we would use as a direct annotation object. There are various files http://www.informatics.jax.org/downloads/reports/index.html eg, MGI_PhenotypicAllele.rpt would have data on allele names,etc. if one wanted to use these in some sort of display.
Correct, alleles would not go in the gpi. Do you want to start a new ticket about requirements for allele selection in the with field?
Is a new ticket for allele IDs really warranted? I started this ticket with two kinds of IDs that MGI curators would find useful for translating info that is displayed in the with field, specifically with respect to improving human readability of the Annotation Preview:
With the title change to be generic to adding new IDs, it seemed appropriate to add these additional human IDs I would find useful here too, but perhaps there's a better place for IDs needed splice isoforms and other proteoforms.
Speaking with @thomaspd seems the priority for being able to annotate whole protein universe is now higher, we should discuss priorities on go-managers call @pgaudet
Note that this is coming up in multiple contexts: https://github.com/geneontology/go-shapes/issues/148
MGI curators were delighted to see the Annotation Preview at our weekly meeting yesterday, and very impressed with how fast it is!
There are a couple additional sets of IDs that would be really helpful to us, particularly for interpreting the with field info, both in the Annotation Preview and even more if they could be translated in the menu for selecting evidence, not just for autocomplete (https://github.com/geneontology/noctua/issues/390) but also in the box for cloning evidence (https://github.com/geneontology/noctua/issues/445), since I often find it difficult to make sure I choose the right one when I have multiple different IDs of the same type.
@hdrabkin might know where to find appropriate places to load this info from.