geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Add annotated image documents / types #368

Open elserj opened 8 years ago

elserj commented 8 years ago

We'd like to have the ability to add annotated images that are displayable via the browser.

Discussion with @cmungall sounds like the ability to do something like a "gulp golr-load-image-annotations" where a tab delimited text file (format TBD) can be loaded into GOLR similar to how GAFs are done is the direction we'd like to head in.

Parts that I believe would need to be worked on include gulpfile changes to add the loading piece, actual code to load in GOLR, changes/addition to annotation pages in AmiGO to add the image type data, methods to display images (maybe carousel, maybe flat page with tables of images), probably other bits.

This is in conjunction with the image annotation work that @preecej is working on.

Also, time scale is pretty lax. Just creating the issue so that work/thoughts can get started and we can brainstorm on how this will work.

cmungall commented 8 years ago

As a first pass we can just overload golr bioentity_association documents (the bioentity would be an image rather than gene or germplasm). We can then reuse all the existing nice faceting mechanisms (e.g. when on the leaf page I can narrow my search dynamically to monocots).

I suggest using isa-partof closure but we need more requirements analysis here. If I am interested in leafs generally I will want to see images of subtypes of leaves. I will probably also be interested in images of parts of leafs too, but I'd get a bit uneasy if I jump too many levels of granularity.

kltm commented 8 years ago

Realistically, the code changes would be minimal if using 3rd-party storage. Most of these point could be covered with additional handler code. Adding a page base type similarly easy (although, maybe that should be abstracted out a bit more). Also: https://github.com/geneontology/amigo/issues/341

kltm commented 8 years ago

In reference to work being done on Planteome.

kltm commented 8 years ago

(To mark the specific use case for planteome, I'd add it to the planteome amigo tracker and have it blocked with this as the upstream issue.)

kltm commented 8 years ago

I think the first step here is to see what kind of data would be loaded. There are two possibilities here: an overloaded GAF-like (or whatever) thing that would take a custom loader or to overlay the image data on top of already-loaded annotation data (what the current demo does).

I suspect that once we have some fairly concrete data running around, or at format/approach, the code to get to basic usability would be pretty fast.

cmungall commented 8 years ago

We may want to experiment with overlays later, but this is simpler. The subject/bioentity is an image denoted by a URL that resolves to a jpg or similar (let's say a thumbnail). Just using the default amigo view this would be behave as any other bioentity. We'd want to then enhance the display a bit, I don't have strong opinions here: just showing the thumbnail, carousel, ...?

elserj commented 8 years ago

@cmungall and I were just talking about this a second ago. So, I think what we will have is some image URL that will display some term(s) so that people can see it and get a non-textual example of the term. I think GAF may be an acceptable input format, we just have to have a new object type of image. Maybe some client code that if the object type is an image to do something like make a thumbnail that links out to the source URL. In other words, instead of At5g20800 in column 2 or 3 of the GAF, have the URL. In column 12, have "image" as the object type, and then figure out how to make it look good in the browser.

kltm commented 8 years ago

The "carousel" has come up a couple of times here, and I don't quite understand--if it is a single object, what are the multiple things carouselling? Otherwise, if we are literally treating these things as bioentities, then the code to detect an image URL ID would be very easy. Not so easy would be that have an ID component like that--it would likely throw a spanner into a lot of things. Preferable might be a standard bioentity document with an additional field that could act as the data overlay in a second loading step; population of the field would trigger the main effects.

austinmeier commented 7 years ago

This issue came up again in our ontology call this morning. We are discussing some very complex Plant Ontology terms related to inflorescence axes and these definitions are accompanied with nice line-art diagrams of different types of inflorescences. The textual definitions are complicated, and nuanced, but the images make it much clearer. So being able to imbed an image would go a long way in clarifying the meaning of these terms. This wouldn't require multiple images/carousel, rather just a single labeled image (from the NY crew).

Image example: inflorescence_img_example.pdf

Edit: I can't for the life of me get that image imbedded in this github comment... (doesn't bode well for my ability to imbed images in the amigo browser...)

kltm commented 7 years ago

I think the addition is simple: add a new field, something like "auxiliary_external_reference_image" that is a remotely accessible PNG, etc. When the field is populated, AmiGO embeds whatever is at that end into the page. Simple.

Now, the hard part is to load that info into the store in the first place, which means we must descend into modifying the loader and loading a new file type (because GAF does not need a new field). That, or doing a second run over the index to populate it (like we did for the geospatial setup). If you want something out the door soon, the latter would be very very easy, especially if you don't have that many images.

In fact, that would be fun enough for a weekend project probably.

austinmeier commented 7 years ago

When you say "new field" that would be a new field in what, exactly?

I think for a temporary solution, the example you outlined would work nicely. To get a sense of scale, I'd say we will likely start with just a single image for each term in the PO (actually it would be fewer, as we would not have images for some categorical terms) I could ask Dennis and his crew to gather the images that he would like to use, and deposit them somewhere on our repo (or elsewhere) with the PO:id in which they should be annotated to.

What format would be best? We can just load all the images somewhere, and provide a delimited file with an ID and a URL to the image?

kltm commented 7 years ago

A new field in the Solr schema, as defined by the amigo metadata files.

If one were to move in this direction, and I won't have time really until after the GO meeting, I would tack towards getting all of the images into S3--I think we're talking about a few thousand here? Well, thinking about it, if you have a webserver (probably apache) up for AmiGO anyways, you could always server it out of the AmiGO static directory or apache as well.

As an experimental load format, let's say a JSON list along the lines of:

[
   {
      "index": "PO:0022008",
      "overlay": {
         "auxiliary_external_reference_image": "http://my.nifty/s3/url"
      }
   }
]

We can reuse this for other overlays in the future; I think that we can probably just reuse most of what was done for the geospatial here.

austinmeier commented 7 years ago

Excellent. It will take some time to get images collected, and labeled correctly, so I'll see what can be done, then we can give it a whirl sometime in the semi-near future.

You're going to be in Corvallis this summer for the GO meeting, we can connect at that point.

Thanks for the explanation.

kltm commented 7 years ago

Yes, we can touch bases at Corvallis.

If there is a non-ontology label, or multiple images for a single term, a different overlay (or even strategy) would be necessary.

austinmeier commented 7 years ago

Yeah, and I'm sure that is in the long-term plan for the image annotation project, but if we could get just simple descriptive images imbedded in the term page on the browser, it would clear up a lot of the complex plant anatomy terms rapidly.

kltm commented 7 years ago

Well, yes, let's call this a one-off. But for image annotations, we should revisit the work that has gone on with geospatial: https://github.com/geneontology/amigo/issues/341

jaiswalp commented 7 years ago

reiterating @austinmeier comments of displaying line diagrams etc. for explaining the anatomy terms on term detail pages. I recommend moving it up the priority.

kltm commented 7 years ago

Or, for that matter, thinking about #421, if we had a field that was essentially an overlay catch-all, a multivalued field (e.g. auxiliary_overlays) that could be loaded separately and incrementally, it could take a number of items to be render as needed. Each value could be like:

{
   "overlay": "auxiliary_external_reference_image",
   "index": "PO:0022008",
   "type": "image",
   "content": "http://snazzy.uri/foo/png"
}

or

{
   "overlay": "patter_description",
   "index": "PO:0022008",
   "type": "markdown",
   "content": "*** bleh"
}