geneontology / panther-enrichment

One of the main uses of the GO is to perform enrichment analysis on gene sets. For example, given a set of genes that are up-regulated under certain conditions, an enrichment analysis will find which GO terms are over-represented (or under-represented) using annotations for that gene set.
1 stars 2 forks source link

Panther enrichment queries and issues #6

Open cmungall opened 7 years ago

cmungall commented 7 years ago

From @ValWood on February 18, 2017 16:54

Hello, I'm logging this issue here because I wanted to be able to upload screenshots easily, and I don't know where the Panther tracker is (I asked on the helpdesk recently but I did not get a reply from Panther yet...)

I was trying to use panther for enrichment (over-representation). It turns out the list I was using had no enrichment but I spotted a couple of possible problems.

  1. My results are summarized as "unclassified 2408" What doe the 2408 refer to ? I can't reconcile this number with anything related to pombe process, I could not find it explained and the link does not retrieve anything

unclassified 2408

2. When I viewed the full results I saw:

vitamin transport

Check out the bottom list "vitamin transport 6"

PomBAse only has 5 genes annotated to "vitamin transport" SPCC576.17c SPCC965.13 bsu1 thi9 vht1

This list of 6 is list of 6

SPCC285.04 SPBC725.10 SPCC1223.09 SPBC2G2.01c (is pantathoate transproter) do not appear to have a role in vitamin transport (or even transport in some cases)

SPAC1B3.15c has a NOT annotation to vitamin transport

vht1 appears to be the only gene product offically annotated to "vitamin transport" in the Panther list.

Copied from original issue: geneontology/go-site#314

cmungall commented 7 years ago

From @ValWood on February 18, 2017 17:33

I should add that I don't think that the 2408 has anything to do with "unannotated" because pombe has 5396 annotated objects (for a long time), not the 5140 reported? Of these, 4499 have a none root none BP annotation.

cmungall commented 7 years ago

From @ValWood on February 18, 2017 17:33

@thomaspd

cmungall commented 7 years ago

From @paolaroncaglia on February 23, 2017 8:0

@ValWood The tracker you're looking for is https://github.com/geneontology/panther-enrichment

cmungall commented 7 years ago

From @ValWood on February 23, 2017 10:13

Hi,

I can't supply the list I used as it is a collaborators list. However any list would do. I just tried with this list of ALL genes annotated to vitamin metabolism systematic_id SPAC5H10.09c SPBC409.13 SPBC725.03 SPAC23H4.10c SPAP27G11.09c SPCC18B5.05c SPAC17A5.13 SPAC1002.19 SPCC4B3.18 SPBP4H10.05c SPCC1223.02 SPBC428.03c SPBC839.16 SPBC1734.03 SPCC1235.02 SPBC1711.04 SPAC222.08c SPCC4G3.16 SPBP8B7.18c SPBP8B7.29 SPBC21C3.10c SPCC1281.04 SPCC18.16c SPAC1952.08c SPAC5H10.08c SPAC6F12.05c SPBP8B7.17c SPBC23E6.06c SPAC1486.10 SPAC9E9.11 SPAC1093.02 SPBC19G7.02 SPBC12C2.07c SPAC29B12.04 SPBC2G2.08 SPBC26H8.01 SPBPB2B2.09c SPCC1450.13c SPAC8F11.09c SPAC144.04c

If I use the enrichment on the front page of the GO site, I get the correct results (although they are not ordered by significance, and so are difficult to interpret, but you can see clearly that 39/40 genes are annotated to vitamin metabolism (the off by 1 error could be a version issue, so we can ignore that)

go site

However, if I go through to the panther site to take advantage of other options like background set, I get totally different results:

panther_site

Only 25 gene products in this list are annotated to vitamin metabolism. It is as if the Panther site is running enrichment from a totally different annotation set?

cmungall commented 7 years ago

From @ValWood on February 24, 2017 16:40

further to this, I just found another not I made while trying to use, might be helpful:

Panther class information is very out of date. This link provides "transcription, DNA-dependent" http://www.pantherdb.org/panther/category.do?categoryAcc=GO:0006351

but this term was renamed to "transcription, DNA-templated" on 2013-11-24 ?

cmungall commented 7 years ago

From @thomaspd on May 31, 2017 19:3

Thanks Val, I see the problem. You're using the PANTHER GO-slim, rather than the full GO results, and yes, most of those annotations come from PAINT only (with some legacy annotations from older versions of PANTHER and our GO-slim), which will be removed/corrected with our release of version 13 late this year, so annotation and GO-slim ontology issues will be fixed then). Note that you can change the reference list at any time, even from the enrichment results page. The information table at the top has a "change" button next to the reference set item. You can change the annotation set as well, by selecting from the drop-down menu next to that item.

cmungall commented 7 years ago

From @ValWood on June 2, 2017 10:23

Paul,

This is all very misleading. I go here http://www.geneontology.org/ I cannot perform an enrichment from here because a background set cannot be provided. It does not make sense to perform an enrichment without using the appropriate background for your experiment. So I need to go to Panther.

If I perform an enrichment or slim at Panther, I should be using the full GO annotations. I don't understand what you mean by "Panther GO slim". A GO slim is a set of terms, not a set of annotations. Do you mean that I am using the Panther inferred annotations?

The Panther GO slim should give exactly the same results as any other GO slim based on GO annotation, and represent the totals that are currently in the GO database. Surely? Especially since the Panther site is the one which is recommended to use for GO enrichment.

I wanted to repeat the procedure, but the panther site appears to be unavailable right now?

This all seems a little crazy to me. If anyone does an enrichment they should be presented with the best available dataset, which is the annotated dataset from the GO database. This should always be the default?

cmungall commented 7 years ago

From @ValWood on June 2, 2017 10:26

I really don't get this at all. Why on earth would you want to do an enrichment on a slim dataset ;(

cmungall commented 7 years ago

From @ValWood on February 24, 2017 16:40 further to this, I just found another not I made while trying to use, might be helpful: Panther class information is very out of date. This link provides "transcription, DNA-dependent" http://www.pantherdb.org/panther/category.do?categoryAcc=GO:0006351 but this term was renamed to "transcription, DNA-templated" on 2013-11-24

@ValWood noted this in February, still hasn't been fixed - @thomaspd @mugitty @huaiyumi need to fix this!

ValWood commented 7 years ago

Now panther is working again I could repeat.

It's all very misleading sorry. Its very strange that you hit submit and THEN select the defaults. I knew that I could change the background set here (but I had to ask about this previously). It did not occur to me that I would select the dataset, so I didn't change this option. I would not know what "PANTHER GO-slim" means or why I would select such an annotation set to perform an enrichment.

Anyway, now I select the correct (useful) manual curated dataset. For simplicity in evaluating results I am using ALL GPs annotated to "mitotic chromosome segregation" in PomBase the current GO database (177GPs).

These are not the results I should get: chr segregation here I should see 177 and 177

ValWood commented 7 years ago

Hang on, that drop is due to a change in the ontology. I'll track that later.

Data is fine is now from 25th May ;)

dna replication

However, there are 2 big issues.

1. We shouldn't have an enrichment option on the GO front page which does not allow the upload of a background set because the likelihood that the users background set is "all genes in the database" will tend towards zero. A proteomics set will need a proteome only background. An array based set will be only the genes on an array. Gene sets are in CONSTANT flux and I have never seen an experiment which uses "all genes in the database". If I'm reviewing a paper I insist that the authors confirm the background set they used for enrichment. If they had used Panther on the GO front page they would need to repeat, because the results are incorrect if the experimental background is not supplied.

2. It should also be much clearer up front how to select the definitive annotation set (GO!) for a species. Could the GO datasets be the default? What are the use cases for the partial annotation in the Panther slim set for an enrichment analysis? Who would use this, and what for?

If it isn't possible to make GO the primary datasets here only way forward is to have an enrichment tool specifically for the GO database.

  1. I don't understand the slim set business. I think when I was doing this previously I thought this was an option to reduce the number of terms presented in the results to a set of slim terms. If it is an annotation set, I don't know what it means. A slim is a set of terms used to summarize a set of annotations it is not a set of annotations itself. All (most) gene products should be in the slim set? But omitting genes products not in the slim set would not improve an enrichment?

thomaspd commented 6 years ago
  1. See the new help page on the GO site; hopefully it's clear how to upload a background list now. But we have modified the PANTHER web services to accept a reference list up front. We will work with @kltm to get a second box on the GO homepage if you think that's worthwhile.

2-3. On PANTHER we're already planning to retire the PANTHER GO-slim. This should be done by the end of the year-ish.

kltm commented 6 years ago

@thomaspd As we will likely be moving over to a new home page sooner rather than later, it may be more efficient to just may these requirements for that.

ValWood commented 6 years ago

I still think its a bit weird at entry point that you don't see that you can add a background until after you "submit".

We are getting some weirdness in the results. @huaiyumi is looking into it. ....basically I don't get the results I expect based on what is in the GO database...

huaiyumi commented 5 years ago

If I remember correctly, I went through this example with @ValWood at the NY GO meeting. The differences of counts between PANTHER and PomBase is because PANTHER doesn't include regulates relation. It is being dealt with in another ticket.

ValWood commented 5 years ago

This might be a better ticket to keep. Apologies for any duplicates.