geneontology / noctua-form-legacy

Simple annoton editor workbench for Noctua.
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

Allow annotation to gene product IDs not in neo #90

Open cmungall opened 5 years ago

cmungall commented 5 years ago

Context: https://docs.google.com/document/d/1RVlRNic37R3EQZfiNjn4R7Q6R3v_XSIg1DT7uQUwW-s/edit

Curators in the multi-organism group need to be able to annotate to species that may not be in neo

The GE allows pasting in of arbitrary IDs - we should make the form have similar functionality

tmushayahama commented 5 years ago

@cmungall @thomaspd can you make the requirements. Sometime ago, we all discussed before briefly that a user will put an ID on the GP then if it is not on the autocomplete, then Noctua Form will make an exception and put it in as it is. However, I don't know if the ID is correct and very error prone Any thoughts?.

thomaspd commented 5 years ago

We want this to be restricted to UniProt identifiers. So the user needs to paste a valid uniprot ID, e.g. P12345. One way to do it would be this:

We could use the UniProt website API to do the request: https://www.uniprot.org/uniprot/P12345.txt

If the service doesn't return a text entry (e.g. try the above URL with P123456 instead), the identifier is not valid and Noctua form should pop up an error: "UniProt identifier P123456 not found". If the identifier is found, text will be returned and you should parse the entry to get the lines starting with GN or OS, and print out those lines in a popup: Uniprot identifier P12345 GENE: (text from GN line) ORGANISM: (text from OS line) with a confirm button, and cancel button.

cmungall commented 5 years ago

Sorry, I didn't see @tmushayahama's request. Let's hold off until thursday software call. We don't want different parts of the stack calling different services.

My suggestion was just to allow pasting of IDs unchecked to bring parity with the GE. If we want to prioritize having this work properly (which is massively important for anyone outside MOD/human) then let's discuss the approach and implement universally:

  1. Ingest in neo (PR ready to be tested: https://github.com/geneontology/neo/pull/35)
  2. Use uniprot web services
  3. Use mygene web services

Note if we go services then we should use proper services not the 1980s https://www.uniprot.org/uniprot/P12345.txt swissprot format! 😄

I think any one of these should be quick to implement but each will have a few implications. For 1, increased size of neo, for 2 or 3, labels will be missing in rdf store with implications for downstream components that use it. And in fact will require lookups to be implemented at various other points in the stack to stop unlabeled IDs showing up.

cc @lpalbou @kltm @deepakunni3 @vanaukenk @balhoff

tmushayahama commented 5 years ago

@cmungall @vanaukenk @thomaspd @lpalbou this was decided not to happen, right? All the gps should be in neo? Any update or changes? However, thinking about the workflow, there should be a mechanism for putting/requesting new gp and annotate without having to wait for a long time or break the curator annotation workflow (i.e. if GP is missing in NF, they cannot save or continue