YuLab-SMU / clusterProfiler

:bar_chart: A universal enrichment tool for interpreting omics data
https://yulab-smu.top/biomedical-knowledge-mining-book/
1k stars 252 forks source link

leading_edge and core_enrichment columns in clusterProfiler GSEA output #103

Open sanjanasood opened 7 years ago

sanjanasood commented 7 years ago

Hi,

I am using GSEA in clusterprofiler which returns gseaResult object. I am struggling a bit to interpret and understand the two columns in the output that have header leading_edge and core_enrichment. It will be helpful if you could help me understand what do these columns refer to and how to interpret them?

Thanks in advnace Sanj

Prerequisites

Describe you issue

Ask in right place

guidohooiveld commented 6 years ago

Please check the GSEA documentation on the Broad website....

Leading edge genes: "As described in the Gene Set Enrichment Analysis PNAS paper, the leading-edge subset in a gene set are those genes that appear in the ranked list at or before the point at which the running sum reaches its maximum deviation from zero. The leading-edge subset can be interpreted as the core that accounts for the gene set’s enrichment signal." Source..

Core enrichment genes: "Genes that contribute to the leading-edge subset within the gene set. This is the subset of genes that contributes most to the enrichment result." Source.

The leading edge information is given as percentages (tags, list, signal), and these metrics are used to define the leading edge subset, which are thus the core enrichment genes.... See halfway this page (section "Detailed Enrichment Results"). I agree it is somewhat confusing....

Maybethis clusterProfiler page on visualization is also of interest.