YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1.03k stars 256 forks source link

using compareCluster to compare gene sets from different species #722

Open lanasushko opened 2 months ago

lanasushko commented 2 months ago

Hi,

I have a question about the usage of compareCluster function. Is it only meant to be used to compare gene sets between different experimental conditions in the same species. Or can it also be used to compare gene sets from different species?

I have a number of closely-related species for which I want to test for enrichment in a comparative manner? Their gene annotations are quite similar. I tried to concatenate the TERM2GENE tables and extract the unique rows and the concatenated table did not contain significantly more genes than the tables for each one of the species. Would it be correct to use compareCluster function in this case?

Thanks!

guidohooiveld commented 2 months ago

It is not fully clear to me what you try to achieve.

The function compareCluster indeed runs a gene set analysis method (ORA or GSEA) on different gene clusters (i.e. lists of genes) that are used as input. Depending on the type of analysis method, it will check which gene sets (i.e. 'terms') present in TERM2GENE are over-represented resp. enriched in these clusters.

compareCluster is agnostic to the content of TERM2GENE. Thus, as long as the TERM2GENE table contains biologically-relevant 'mappings' it can be used. Please note that if for a species you have 200 gene sets defined, and for the related species another 200 sets, then the multiple testing adjustment will correct for testing 400 gene sets, and not for 200 test that would happen if 2 independent, species-specific analysis would be performed. Hence, FDR values will not be identical between the 2 approaches (but p-values should!)