GMOD / Apollo

Genome annotation editor with a Java Server backend and a Javascript client that runs in a web browser as a JBrowse plugin.
http://genomearchitect.readthedocs.io/
Other
128 stars 85 forks source link

Product Description controlled vocabulary #2488

Closed obsidian83 closed 4 years ago

obsidian83 commented 4 years ago

Similarly to GO and ECO ontologies, allow the product descriptions to be loaded, i.e. allow users to select terms from PDB or Uniprot to annotate the product descriptions.

nathandunn commented 4 years ago

just now saw it . . .. that makes sense. I. had a few other minor 2.6.1 fixes, so that should go in rather easily

nathandunn commented 4 years ago

@obsidian83 I'm going to need a bit more info.

Currently Gene Product Info uses prior entries, however, I can expand that to use either a pre-loaded list similar to what we do for suggested names or an API end-point, similar to what we do for gene ontology and evidence ontology lookups.

If you have the end-points I can see what I can do.

nathandunn commented 4 years ago
nathandunn commented 4 years ago

Another two possibilities would be using:

but searching for gene in each case.

I don't see it here: http://api.geneontology.org/api

nathandunn commented 4 years ago

Also depends on the number and kind of species you'll be looking at. Pinging 10 services won't make sense, though I'm sure we could provide a few options.

nathandunn commented 4 years ago

Also, which species do we need and what type of IDs?

nathandunn commented 4 years ago

If its a large number of species, you may want to pre-load them and restrict them by species type.

cmungall commented 4 years ago

@obsidian83 can I clarify what you need, and also get the broader context.

As I understand you want to annotate gene products using GO within Apollo. Your gene products will be represented either by UniProtKB IDs or PDB IDs. To do this, you need the gene product/protein description.

Is it just the product description you need? By description you mean a free text description supplied by UniProt? For example on https://www.uniprot.org/uniprot/Q15465

You would like to get back text The C-terminal part of the sonic hedgehog protein precursor displays an autoproteolysis and a cholesterol transferase activity (By similarity). Both activities result in the cleavage of the full-length protein into two parts (ShhN and ShhC) followed by the covalent attachment of a cholesterol moiety to the C-terminal of the newly generated ShhN (By similarity). Both activities occur in the reticulum endoplasmic (By similarity). Once cleaved, ShhC is degraded in the endoplasmic reticulum (By similarity).

Do you need other information? For example, if there has been automatic annotation done using interpro2go, do you need this too?

Sorry I don't have the full context of the project, this is for VuePathDB?

obsidian83 commented 4 years ago

@cmungall This is for VEuPathDB, the idea is to allow the community a list of product descriptions to chose from for annotation. This was one of the functions genedb/artemis has that we wanted to emulate in apollo. The idea would be similar to the GO terms encoding in apollo where users could start typing and terms are listed, it would be a non-redundant list and avoid variations caused by users manual input.

@nathandunn Sorry I was a little vague on the endpoints, and the product description really just being the term "hypothetical protein, conserved" or "alpha-amylase" so we may not want as much information as an api provides, maybe just a list. I'd imagine that the community would enter uniprot ids/or the pdb ids as evidence based on orthology as dbxrefs to support that assignment.

We have ~400 species in VEuPathDB (vectors, protistan parasites and fungi). I could generate the list of product descriptions we currently use for curation at geneDB and this could be used as an initial template list.

nathandunn commented 4 years ago

@obsidian83 Uploading a list is pretty easy. I would just copy what I'm doing for suggestedNames and then you use a webservice to add a list of lookup names. I think this works so long as you don't want the protein names replaced with IDs on output. If we do, we just need to add a name / value pair. Let me know either way.

nathandunn commented 4 years ago

So to clarify @obsidian83 :