geneontology / amigo

AmiGO is the public interface for the Gene Ontology.
http://amigo.geneontology.org
BSD 3-Clause "New" or "Revised" License
29 stars 17 forks source link

Unintuitive behavior of GO class facets on when querying gene lists by term #443

Open ValWood opened 7 years ago

ValWood commented 7 years ago

I realise no AmiGO work is going on right now but I have been meaning to report this for a while.

This is an example that I think it clear.

I am looking at the gene products annotated to the GO term GO:0065009 regulation of molecular function

I want to see which terms generate these annotations, so I go to "direct annotation" Here I see lots of component annotations "plasma membrane" "nucleus". I don't get it.... I think these are other "direct annotations" to terms that the gene products annotated to GO:0065009 are also annotated to maybe?

Its very confusing anyway. I con't figure out why they would be there.... very confusing amigo behaviur

kltm commented 7 years ago

@ValWood From http://amigo.geneontology.org/amigo/term/GO:0065009 , all three links go to direct and indirect annotations or gps, there is no direct annotation only option...?

ValWood commented 7 years ago

menu option

kltm commented 7 years ago

Likely bookmark of page in question: http://amigo.geneontology.org/amigo/search/bioentity?q=*:*&fq=regulates_closure:%22GO:0065009%22&fq=source:%22MGI%22&sfq=document_category:%22bioentity%22

kltm commented 7 years ago

I believe your statement

I think these are other "direct annotations" to terms that the gene products annotated to GO:0065009 are also annotated to maybe?

is correct. The view is completely GP-centric, which is really an accumulation of information from multiple annotations folded into one GP doc.

I'm thinking that this is related to https://github.com/geneontology/amigo/issues/379 ?

kltm commented 7 years ago

Tagging @cmungall as he was also tagged in #379; forgetting original reason.

ValWood commented 7 years ago

Yes! I'm not sure if it is intended behavior, but if it is, I don't know what it means!. I was reminded about it recently when @pgaudet mentioned it recently in another ticket.

kltm commented 7 years ago

Okay, do you prefer holding on to one ticket over the other? Perhaps we should dupe the older one onto this newer one?

ValWood commented 7 years ago

I have no preference, so whichever one you think is clearest.

cmungall commented 7 years ago

Do I understand:

From the term page http://amigo.geneontology.org/amigo/term/GO:0065009 you click on " to all genes and gene products annotated to regulation of molecular function." (nothing to do with direct vs indirect, I think this is a red herring?). This takes you to a page that is the gene list for that term. The URL for this is of the form of the one @kltm posted.

I think this is working as expected, but some things could make this clearer:

  1. The link text could be improved (but I'm not sure what to? Maybe simplify to "all genes/products"?
  2. rather than linking to a query, it could link to a more REST-like URL such as /term/GO:0065009/gps. This would have a header telling the user where they are.

But overall I think we will end up restructuring this so that within the term page it should be possible to clearly see things from an annotation-centric and gene-centric perspective (out of scope for this ticket).

cmungall commented 7 years ago

going back to the original intent:

I am looking at the gene products annotated to the GO term GO:0065009 regulation of molecular function

I want to see which terms generate these annotations

I don't follow, this is already visible. E.g. if I am here:

http://amigo.geneontology.org/amigo/term/GO:0065009

You can see the terms here:

image

If you would like a distinct set of all annotated terms under this GO term we can do a query for this, but it's not easy to do directly in amigo

ValWood commented 7 years ago

Not quite.

Once you have got to this page, you have a list of all gene products annotated to "regulation of molecular function"

At this point you might want to filter on how the genes are annotated to molecular function so here I go to the filter "direct annotation". On other pages I can use this to filter away certain contributing descendants (I think?)

Anyway here, even though I am looking at a molecular function term, the "direct annotation filter" list component terms. I don't know why?

ValWood commented 7 years ago

So I guess my question is What is the menu filter for direct annotation showing me in relation to the list of ~3000 mouse genes annotated to Molecular function "regulation of molecular function".

Should it say "co-annotated terms"?

ValWood commented 7 years ago

I can see them a few at a time but I can't filter them. I thought the direct annotation filter was for that purpose. I see now that the menu had a different (misleading?) use of "direct"

cmungall commented 7 years ago

Anyway here, even though I am looking at a molecular function term, the "direct annotation filter" list component terms. I don't know why?

not following. On http://amigo.geneontology.org/amigo/term/GO:0065009 I look at the facet counts in the "GO class (direct)" facet (which is what I think you mean?)

image

I don't see component terms here

cmungall commented 7 years ago

Just to minimize confusion, can you say precisely which you are doing. I don't see the string "direct annotation filter" on the term page at all.

ValWood commented 7 years ago

Hmm, look at the image on the top screenshot. I can see components. Yours is the annotations link/list I think? I was using the gene product link

cmungall commented 7 years ago

AHH got it. Ignore my Q. Will respond in a mo

ValWood commented 7 years ago

This link, sorry didn't check the link Seth posted. http://amigo.geneontology.org/amigo/search/bioentity?q=*:*&fq=regulates_closure:%22GO:0065009%22&sfq=document_category:%22bioentity%22

cmungall commented 7 years ago

OK, I was looking at the term page http://amigo.geneontology.org/amigo/term/GO:0065009

But I see that when you follow the link to get all the genes and gene products directly and indirectly annotated (i.e. the gene set) (the link you just posted) the operation of term-based facets become confusing.

What we are looking at here is the gene list. You can think of each gene here having an "invisible bag" of direct and indirect terms (we used to show these but took them out). So on one level the facets are operating as expected, you're just filtering based on the members of these sets. But of course it's also confusing, and you might also expect the facets to work according to your original query. You are right, the term filter effectively becomes a co-annotation filter in the context of a query like this.

OK, we can do some cosmetic fixes in the short term, but longer term (I will explain the lag in amigo development at the meeting) we need more of a buffer between amigo and the underlying solr filters. When on a gene list for a term the facets should switch to explicit labels for things like 'co-annotation' and also a way of doing exactly what you want in this case: show direct annotations if they are a descendant of the term of interest.

ValWood commented 7 years ago

Sounds good. I'm glad I got this one understood at last!

ValWood commented 7 years ago

having a clear out of my responses waiting folder, this is the ticket where PAscale mentioned it too. https://github.com/geneontology/go-ontology/issues/13685#issuecomment-312710378 It would be good to fix as I'd expect that this confuses a lot of people (I tend to use the gene product view rather than the annotation view).

cmungall commented 7 years ago

Thanks. I believe @pgaudet's use case (determining if an ontology class can be obsoleted) is easily satisfied in amigo by looking directly in the term page?

cmungall commented 6 years ago

Going back to @ValWood's Q in this ticket from Sep 2017:

So I guess my question is What is the menu filter for direct annotation showing me in relation to the list of ~3000 mouse > genes annotated to Molecular function "regulation of molecular function".

Should it say "co-annotated terms"?

Yes! This is exactly what it is showing, and this is exactly what we should say!

(Minor formal quibble: the original query term will be in this list, hard to exclude it, so we have to interpret co-annotation as a reflexive relation, hopefully people won't be confused by this)

cmungall commented 6 years ago

My pull request changes this to show Direct co-annotation and Inferred co-annotation. Is it confusing to retain both? It's one of those features that are potentially useful for power-users.

Also note that the implicit GP2Term relationship here is causally upstream of or within, i.e. it follows regulates relations as well as isa-partof. We could easily change it so we have something like "inferred to also be involved with" vs "inferred to be involved with or upstream of". However I can see this as being massively confusing, and probably wouldn't make much sense to do until we have fixed a lot of the databases to be providing the correct gp2term relationship in their GAFs anyway

ValWood commented 6 years ago

We mainly use is "direct annotation" should that also be included for gene product (will now only be available for "annotations" not for "gene products"?

I can't comment on "direct co-annotation" and "Inferred co-annotation" because I don't really know how they are used. The only use I can see is 'gene products annotated to x are often co-annotated to y' but it seems a very impenetrable and confusing way of making this information available.

In general "direct- (x)annotation" is not really useful for any GO and product end-users because it is completely arbitrary. It is meaningless because I could make direct annotations to every parent term. For example I annotate to "nuclear chromatin" I could also annotate to "nucleus". Sometimes we will have both annotations sometimes not. Do you only consider the most specific annotation here? or any direct annotation?

So following from this, "direct- (x)annotation"I suspect that "Inferred co-annotation" is more useful than "direct- (x)annotation"

We only use "direct annotation' administratively to see the level that the annotation is made, in order to fix/improves annotations, make decisions about which terms can be merged, kept or culled.

cmungall commented 6 years ago

Yes, agree direct co-annotation not useful for end-users. There are a handful of cases where it's useful for power curators. E.g. I may want to see for a given BP how many direct annotations to the root of MF there are. But let's just drop direct it.

For the inferred, the main use case is being able to drill down, e.g. genes that are annotated to kinase cascade AND cell proliferation.

ValWood commented 6 years ago

For the inferred, the main use case is being able to drill down, e.g. genes that are annotated to kinase cascade AND cell proliferation.

It doesn't really work for this anyway....(because of the 'directness'

Using "cell cycle" and "MAp kinase cascade" (because fission yeast do not directly annotate to cell proliferation), I don't find any results using AmiGO because none of the annotations to "MAP kinase cascase" are direct. This is not a good way to do (A+B) queries, it will always be misleading.

If you keep the 'co-annotation one it needs to be direct+ inferred to be menaingful.

There are actually 10 GPs annotated to both terms:

Gene name Systematic ID Product
atf1 SPBC29B5.01 transcription factor, Atf-CREB family Atf1
byr1 SPAC1D4.13 MAP kinase kinase Byr1
byr2 SPBC1D7.05 MAP kinase kinase kinase Byr2
mpr1 SPBC725.02 histidine-containing response regulator phosphotransferase Mpr1
pyp1 SPAC26F1.10c tyrosine phosphatase Pyp1
pyp2 SPAC19D5.01 tyrosine phosphatase Pyp2
shk1 SPBC1604.14c PAK-related kinase Shk1
spk1 SPAC31G5.09c MAP kinase Spk1
sty1 SPAC24B11.06c MAP kinase Sty1
wis1 SPBC409.07c MAP kinase kinase Wis1
ValWood commented 6 years ago

If you want a way to do intersects/coannotation you need a simple query tool like this: )

https://www.pombase.org/query then you can do as many intersections (+ unions and subtractions) as you please.

selewis commented 6 years ago

What a lovely tool, set operations are so important to have available.

On Mon, Jul 30, 2018 at 3:09 PM Val Wood notifications@github.com wrote:

If you want a way to do intersects/coannotation you need a simple query tool like this: )

https://www.pombase.org/query then you can do as many intersections (+ unions and subtractions) as you please.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/geneontology/amigo/issues/443#issuecomment-409028389, or mute the thread https://github.com/notifications/unsubscribe-auth/ABcuEEXOIjPtlbjOAmTRCwN0hT3otc3vks5uL4QxgaJpZM4Pkk2m .

ValWood commented 6 years ago

Lots of things so similar (Intermine, EnsemblMart), but not so simply..... I use it about 100 times a day...