geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

In the term matrix, make it more obvious that results are below #336

Closed ValWood closed 8 years ago

ValWood commented 8 years ago

Or is that just me? I didn't see them at first. I had some menus open though which pushed it right down the page.

It would be nice if the left hand menu wasn't quite so wide, and the output was adjacent......

kltm commented 8 years ago

I agree with making the results more integrated into the display, but I think I've hit a point of diminishing returns with it until I can switch the toolkit (there are a few items around for that).

kltm commented 8 years ago

@ValWood I've essentially gotten a new version of the matrix setup on a new framework. As I'm looking at the last few bits, I'm wondering exactly how important is the order selector? Are all of them useful/necessary? Only one or two? Is the whole thing not really necessary?

ValWood commented 8 years ago

It is't critical but its quite cool. And it works right now....(I closed the ticket about that already). Is it causing you problems? If so the default should be to preserve the input order. I really like the "by co-annotation count" too as this is automatically close to what I would usually want to create manually.

kltm commented 8 years ago

@ValWood , the changes just pushed are a very different interface, but I believe an objectively better one with only two real drawbacks: 1) no cell highlighting and 2) the cells redraw without whooshing around (which was a nice visual indicator). Otherwise, there is a lot more power here, including being able to get things onto one screen, zoomin, and being able to tailor custom graphs for publication.

cmungall commented 8 years ago

Seth and I just discussed the new interface. It's better in multiple respects, but replacing the numeric values with a colors is suboptimal as there is no meaningful absolute scale here. For example, if two highly specific terms A and B each have 5 genes, and those 5 genes are all shared, that's highly significant, but a low absolute number.

And of course the number you are most interested in is zero, which is invisible here.

We're looking at some other options

On 1 Apr 2016, at 17:42, kltm wrote:

@ValWood , the changes just pushed are a very different interface, but I believe an objectively better one with only two real drawbacks: 1) no cell highlighting and 2) the cells redraw without whooshing around (which was a nice visual indicator). Otherwise, there is a lot more power here, including being able to get things onto one screen, zoomin, and being able to tailor custom graphs for publication.


You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geneontology/amigo/issues/336#issuecomment-204618275

kltm commented 8 years ago

For later reference, I think I found a text addition section that might work. https://plot.ly/javascript/text-and-annotations/

kltm commented 8 years ago

Sorry, chatty log there--it was easier to run experiments on the dev server. 0s are now uniformly white, and I've brought back the step colors from the last version.

kltm commented 8 years ago

I thought I had found a way of getting text in there, but it did not scale well (in any sense).

ValWood commented 8 years ago

I much prefer the old version.

The zooming isn't useful.

The numbers are critical, I can't do anything with just colours, even if the other features seem to add something, they aren't things I would want to do with this info (except maybe save the image, but for now. this isn't a priority....). I'm interested in ALL the numbers, not just the zero. (Even if useful, the scale bar colouring does't seem to work, and the numbers on the bar don't correlate with the numbers in the graph)

Sorry, a verschlimmbesserung ;(

It was really great before........

ValWood commented 8 years ago

Also can no longer filter on species.

kltm commented 8 years ago

@ValWood The new species filter (#209 and #247) depends on a different set of fields being available that have not yet been loaded on that machine. This load should occur in the next day or so, making that filter available again.

Alas. I'll revert to the old version of the graph. This will mean that a few other things will revert as well (position of results above the fold, etc.), but I understand the issue with the numbers (the one thing). Unfortunately, as long as the old graph is used, that will inhibit other feature development in this tool, but at least the matrix tool will be more usable.

ValWood commented 8 years ago

At present the significance is largely moot, because at present it can’t be used in a exploratory fashion. Compare the cleaned up pombe matrix for these cellular process terms:

pombe matrix 57

with what you see for any other species. The majority of the non zero intersects are due to annotation and mapping issues. I looked at one intersect on Friday (lipid metabolism vs protein folding) and all that I looked at were errors. Not had time to report them yet.

Even once an intersect is cleaned up, I think the significance is largely meaningless. A process either intersects another process, or it doesn't. This is what the intersects should look like in a 'graphical view'.

visual_slim

This shown nearly all of the intersections (except the processes in red diamonds which intersect with lots of things). There are a few exceptions which I could not fit into this view but they are well documented.

Sure, you will get a statistically significant result for intersection of RNA metabolism with ribosome biogenesis. Currently you will also get nonsense statistical significance with lots of processes where you should not due to 'consistent' annotation errors.

As a first stage this tool should not be used in an exploratory fashion. It should be used for annotation clean up. Unfortunately I don't have time to check every intersection for every species as I have done for fission yeast. So the next stage is to implement soft checks' and get the originators to check their own annotations.

ValWood commented 8 years ago

And of course the number you are most interested in is zero, which is invisible here.

That's not completely true. I check all the low numbers. Actually all the numbers. I want to know that the intersect makes total sense biologically.

ValWood commented 8 years ago

This is why I'm not keen on the relationship "causally upstream of"

Look at the figure above. The entire set of genes in blue circles in the "expression" module is "causally upstream of" every other process in the figure. Many other processes are causally upstream of the cell division module. Basically if you mutate many gene products you can affect the cell cycle progression in some way. However, these are NOT regulating the cell cycle in any way in a normal cell. A defect in splicing only manifests as a cell cycle problem in a mutated cell. To most biologists (based on lots of feedback) it seems wrong to annotate these genes in anyway to be involved in the GO processes they affect (this is what a biologist would call upstream, and indirectly affecting).

So even though lots of splicing factors, when mutated, cause a defect in the mitotic cell cycle transitions, these transitions are not "regulated" by splicing as far as we know. These are regulated post translationally by modification events (mainly phosphorylation and dephosphorylation) and by catabolism (in some instances there might be some transcriptional feedback, for example in the production of cyclins, for e.g. in cerevisiae G1/S, but this does not operate in fission yeast).

If however, we start to annotate all of the upstream events using this "causally upstream of" relationship, we will again obscure the interesting biology where there is real regulation taking place. It also changes the scope of what our users expect a GO annotation to represent (this is also supported by the GO annotations they make during community curation).

For an oposite example, there is evidence that a mechanism exists to up-regulate translation in response to nutrient availability which would be a mechanism to regulate energy metabolism pathways. If this is published then we would expect to see a link between these 2 processes. This would be obscured if translation was annotated as "causally upstream" of everything it affects.

Even if you could filter this relation, I don't know why we would want use it. Annotation is complicated enough without conflating phenotypes (indirect effects usually come from indiscriminate phenotype annotations) and real processes and their real regulatory mechanisms......

ValWood commented 8 years ago

OK to make a point just for the cell cycle /splicing annotation intersection, I already opened 3 tickets.

https://github.com/geneontology/go-annotation/issues/1373 https://github.com/geneontology/go-annotation/issues/1374 https://github.com/geneontology/go-annotation/issues/1375

If I continued like this solidly for 6 months I doubt I could report all of the errors....

ValWood commented 8 years ago

It might be an idea to consider some fancier new graph once the annotation rules are built and the problematic annotations are purged. I don't think the matrix is so useful for anything else until this stage is complete. We really do need to address annotation quality in a big way....

cmungall commented 8 years ago

@ValWood agree with everything but not sure I follow your objections to causally-upstream-of. This relation and ones like it are intended precisely to be able to disambiguate mere X-affects-Y from X-regulates-Y (and to allow the case where we don't know). This can only enhance the power of your constraints (and conversely, your constraints can help us place existing annotations into the right bucket). But this is straying a bit from the original ticket and I don't want an important discussion to be lost.

Depending on the nature of the issue, maybe https://github.com/geneontology/annotation_extensions/issues (if this is just about the use of these relations in AEs). Or if it's a concern with the relations themselves or their direct use in LEGO we should discuss this in a LEGO call.

ValWood commented 8 years ago

I think more generally so LEGO (Is there a tracker to record this?). I could migrate this discussion.

First I might have misunderstood its use. If gene A is annotated to process Y Gene B is annotated as "causally-upstream-of" gene A. Does gene B become annotated to process Y? How does gene B look in the GAF?

kltm commented 8 years ago

Yes, good discussion, but definitely needs to be migrated. @cmungall might have a spot in mind?

ValWood commented 8 years ago

Is there a tracker for LEGO discussions?

kltm commented 8 years ago

Well, the trackers tend to be around specific issues, rather than more free-ranging discussions. If @cmungall doesn't have somewhere else in mind, and this is more LEGO-centered, you might want to try geneontology/noctua-model; a more general discussion maybe geneontology/go-annotation?

cmungall commented 8 years ago

On 4 Apr 2016, at 21:48, Val Wood wrote:

I think more generally so LEGO (Is there a tracker to record this?). I could migrate this discussion.

First I might have misunderstood its use. If gene A is annotated to process Y Gene B is annotated as "causally-upstream-of" gene A. Does gene B become annotated to process Y? How does gene B look in the GAF?

Well the causal relations would be between the activities, not the genes, but I see what you mean

Within the current formalization we cannot deepen to regulation-of-Y, as causally-upstream-of is weaker (does not imply control). But @ukemi and @vanaukenk have a qualifier proposal that would allow annotation of B to Y, with appropriate qualifier

kltm commented 8 years ago

Shoo-shoo--different doc location.