MaayanLab / enrichr_issues

5 stars 3 forks source link

Terms and genes seem to be missing #39

Closed jackh726 closed 1 year ago

jackh726 commented 2 years ago

I have a gene set of 404 mouse genes (https://gist.github.com/jackh726/3296f36b1d06245d8b741ef1a5151663).

When I run this set through Panther with "GO Biological Process", one term that pops up is "lung development" (GO:0030324), with 26/216 genes found. Similarly, running the set through ShinyGO gives lung development as a top-enriched term (23/210 genes found). However, running through Enrichr has an insignficant adjusted p-value, with only 4 genes matching. Indeed, when I did a term search (https://maayanlab.cloud/Enrichr/#meta!meta=lung%20development) and downloaded the gene set, there are only 35.

To make things even more confusing, the highest fold-enrichment term in Panther (astrocyte activation involved in immune response) doesn't show up in the term search for Enrichr at all.

Looking up the ontology term does show ~230 genes for mouse (http://amigo.geneontology.org/amigo/search/bioentity?q=*:*&fq=isa_partof_closure:%22GO:0030324%22&fq=taxon_subset_closure_label:%22Mus%20musculus%22&sfq=document_category:%22bioentity%22), which is close to what Panther and ShinyGO show, and very different from what is shown with Enrichr.

AviMaayan commented 2 years ago

Hi @jackh726 this is because the way we processed the data from GO and converted it into a gene set library. I believe we restricted it to only include human annotations and perhaps some other filters. We cut the tree in a way that may exclude genes. There are different ways to do it. Regarding the Panther term that is missing, note that our Panther library is from 2016, so we need to update it. Hope this helps.

jackh726 commented 2 years ago

I figured it would be something like that, but I couldn't figure out what the filter would have been. GO:0030324 has human genes too, and many of the genes missing also have a human ortholog (e.g. Gata6). What is most shocking to me is not just that some genes are missing, but more than 80% for this term.

Regarding Panther, I maybe should have been more clear. With Enrichr, I only look at the "GO Biological Process 2021" ontology.