geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Create a minimal node service to provide annotation summaries for given inputs #282

Closed kltm closed 8 years ago

kltm commented 8 years ago

Create a minimal node service to provide annotation summaries for given inputs.

Initially, this node service (not perl), will take a list of gene product IDs, and produce the related counts, etc. Other arguments may be species, etc. It is intended simple GET/POST -> JSON app. This will be used to get data counts for downstream term enrichment services, and will not encompass things like ID resolution (at this time).

This is an extension of the discussion from the implementation of #281 (see output there), should be initially useful in Planteome use cases.

@qubotong @elserj

kltm commented 8 years ago

I'll have more on this in the next couple of days, and will have a write-up, but I'm almost done porting over the legacy GOlr library within our legacy mega-library (https://github.com/berkeleybop/bbop-js) to the new frameworks that we now have (https://github.com/berkeleybop/bbop-manager-golr, etc.). Once done, we'll have access to all sorts of ways to contact GOlr via the API on clients and servers. Immediately after that, I'll be able to complete the beta service as discussed at the meeting last week.

Tagging on @cmungall as well.

qubotong commented 8 years ago

@kltm the term-to-gene query we talked last week miss one data required. There are two data would be needed in the calculation. Given a term, I need to know how many genes are associated to this term in background database. Also, I need to know how many genes totally in the database. We really really appreciate your help. Thank you very much.

kltm commented 8 years ago

@qubotong No problem. We'll probably have to have a little more back and forth once I get a little further along, but I finally have a base API so I can move more quickly now.

For the /term-to-gene endpoint, we can skip it for now while we work everything out, but I'm assuming that you'll eventually supply the species as well (as NCBI taxa ID)--without that, you'll get strange counts relative to your input (which is more-or-less pinned to a species by being an gene product ID).

kltm commented 8 years ago

Okay, a little more progress here; you (@cmungall, @qubotong, @elserj) might want to take a look at https://github.com/geneontology/amigo/blob/master/scripts/gass.md .

There is obviously more work to be done (making sure the server isn't blocking, adding more to the API, like species filters, optimizing server contact, etc.), but there is something there to talk about now.

@qubotong, does the API look similar to what you want? Honestly, I removed a couple of bits as the response size was getting overwhelming, but I want to make sure the minimum is in there for moving forward. Also, if you are wanting the total number of something in the database, you likely want that by species, particular ontology, etc. We can talk about that more.

Once we've agreed that this is a place to start, and have maybe filled it out a little bit more, I was thinking we could move this into its own repo, with an eye to eventually fold some of the numeric code in (simplifying the code, speeding it up) and enriching the API. With that, there would be a base for other applications to work on visualization, reporting, etc.--the fun stuff--while GP/Planteome/etc. maintained reference analysis crunchers.

qubotong commented 8 years ago

@kltm @elserj thank you Seth, this is really good. Yes, I think the number you give me is exactly what I want. One question, in the example, the query doesn't include the specie filter, right? I think we could include it now. I will give the user a selection and pass the corresponding species ID to the query. Do you have existed server which could query the species information from database? so we could get a list of dynamic species data. If not, we could use static data for now.

qubotong commented 8 years ago

@elserj Could we include Jaden to this repository? I think he may could help on the server development a lot.

kltm commented 8 years ago

@qubotong well, there are 8750 annotated species at present, so a list isn't really going to be useful. I'd suggest either 1) just using a GOlr/AmiGO autocomplete widget to just autocomplete on the species available or 2) have a more limited curated set that you use.

kltm commented 8 years ago

Species filters (of occasional questionable use) have been added to all of the endpoints. This is documented in the gass.md.

As well, for very temporary giggles, I have put a test server up at: http://tomodachi.berkeleybop.org:6455 I'm not even sure if it blocks itself or not, so use with care. As well, it will likely go away without notice, etc. This is just to give something to look at before we move on. It is pointing at an experimental GO server right now.

cmungall commented 8 years ago

The list of species of interest to planteome is smaller

On 20 Nov 2015, at 0:18, kltm wrote:

@qubotong well, there are 8750 annotated species at present, so a list isn't really going to be useful. I'd suggest either 1) just using a GOlr/AmiGO autocomplete widget to just autocomplete on the species available or 2) have a more limited curated set that you use.


Reply to this email directly or view it on GitHub: https://github.com/geneontology/amigo/issues/282#issuecomment-158319036

kltm commented 8 years ago

@cmungall , even if it's smaller, there is no effective way of extracting a list of species (with label and id), short of iterating over the gps, unless they are included as a loaded ontology.

kltm commented 8 years ago

I've spun this out of AmiGO and into its own top-level project here: http://github.com/berkeleybop/golr-numbers-service . I'm going to remove traces from this repo and then close this item--further discussion and improvements can be worked on over there.