Add additional IDs (additional ontologies?) into Minerva

krchristie commented 7 years ago

MGI curators were delighted to see the Annotation Preview at our weekly meeting yesterday, and very impressed with how fast it is!

There are a couple additional sets of IDs that would be really helpful to us, particularly for interpreting the with field info, both in the Annotation Preview and even more if they could be translated in the menu for selecting evidence, not just for autocomplete (https://github.com/geneontology/noctua/issues/390) but also in the box for cloning evidence (https://github.com/geneontology/noctua/issues/445), since I often find it difficult to make sure I choose the right one when I have multiple different IDs of the same type.

UniProt IDs for mouse genes: For example in PMID-25130899-KRC Tmem163 (gomodel:5900dc7400000968), I see that that human UniProt IDs get translated into the human gene name and species, but not the mouse ones. Presumably the human ones are available to Noctua because they are used as the annotation objects for human genes, but considering how useful it is to be able to read the with field info to confirm that it was entered correctly, it would be really helpful to have the mouse UniProt IDs present also, since we use the UniProt protein IDs when we make annotations with IPI evidence.
MGI allele IDs: For example, in PMID-26023097-KRC Ift27 & Hspb11 (gomodel:591f58ab00000021), two different alleles, in this case each for a different gene, are referred to in the with field.

@hdrabkin might know where to find appropriate places to load this info from.

kltm commented 7 years ago

@cmungall any thoughts on coordinating this?

krchristie commented 7 years ago

I was trying to annotate a human isoform, that is conserved across many mammals including mouse and hamster, and for which UniProt has an ID, specifically Q9Y5P4-2. However, the only autocomplete option I get is for the base ID (Q9Y5P4). I tried out using a PR: prefix, but that isn't allowed either. I am allowed to put this human UniProt isoform ID into Protein2GO, so it seems like there should be a way to do it here.

relevant model: gomodel:593423e000001008

cmungall commented 7 years ago

We get human entities from

https://github.com/geneontology/go-site/blob/master/metadata/datasets/goa.yaml#L459-L472

Currently the procedure is to request the set of IDs you'd like to annotate from the authority for that species, in this file it's goa@ebi.ac.uk -- @tonysawfordebi

I brought up this issue at the meeting and attempted to have a discussion about whether we should have a common SOP across GO for what is included and not included, PRO vs UniProtKB IDs, we decided to set up a working group for this.

As a last resort you should be able to paste in a CURIE (prefixed ID) like UniProtKB:Q9Y5P4-2 to inject it directly into the model, @kltm can confirm

kltm commented 7 years ago

Direct ID insertion for GPs is currently only allowed for the annoton entry sub-panel (not the individual one or any of the others). This was a negotiated balance to work with UniProt.

krchristie commented 7 years ago

That's pretty unintuitive that it's only allowed in one sub-panel, and also which prefix works...

Also, it would be nice if the graph view would convert the UniProt isoform ID into some more human friendly name.

kltm commented 7 years ago

The exception is known to be inconsistent, it was negotiated to provide a temporary hack. As the system does not know about the entity that was inputted, no label can be provided.

cmungall commented 7 years ago

As I said, "as a last resort". Really you should be using IDs sanctioned by the GPI provider.

krchristie commented 7 years ago

Maybe I'll just do this in Protein2GO, where it is allowed to use the UniProt isoform ID. I am accumulating too many Noctua models with issues that require I remember to check them later...

ukemi commented 7 years ago

@krchristie Can you check to see if the isoform identifers are in the GPI file which is referenced above? If they are there, there must be an issue loading them into Noctua. If they are not there, we should try to figure out why they are missing. Since they are not modified forms, it seems to me that they should be in that file.

hdrabkin commented 7 years ago

I examined two gpi files from GOA, goa_human.gpi.gz and goa_human.isoform.gpi.gz Neither file appears to have human isoforms (the Q9Y5P4-1,-2, etc. of a UniProt id (Q9Y5P4)). Thus these are not available in Noctua.

As far as allele names: I don't thing allele ids are in the gpi (just grepped for a few examples); makes sense since the alleles would not be something we would use as a direct annotation object. There are various files http://www.informatics.jax.org/downloads/reports/index.html eg, MGI_PhenotypicAllele.rpt would have data on allele names,etc. if one wanted to use these in some sort of display.

cmungall commented 7 years ago

Correct, alleles would not go in the gpi. Do you want to start a new ticket about requirements for allele selection in the with field?

krchristie commented 7 years ago

Is a new ticket for allele IDs really warranted? I started this ticket with two kinds of IDs that MGI curators would find useful for translating info that is displayed in the with field, specifically with respect to improving human readability of the Annotation Preview:

UniProt IDs for mouse genes (e.g. for IPI with ____)
MGI allele IDs (e.g. for IGI with _____) with the title "Additional IDs that would be useful for Annotation Preview & Evidence selection", which Seth later changed to "Add additional IDs (additional ontologies?) into Minerva "

With the title change to be generic to adding new IDs, it seemed appropriate to add these additional human IDs I would find useful here too, but perhaps there's a better place for IDs needed splice isoforms and other proteoforms.

cmungall commented 5 years ago

Speaking with @thomaspd seems the priority for being able to annotate whole protein universe is now higher, we should discuss priorities on go-managers call @pgaudet

ukemi commented 5 years ago

Note that this is coming up in multiple contexts: https://github.com/geneontology/go-shapes/issues/148

geneontology / noctua

Add additional IDs (additional ontologies?) into Minerva #444