BaderLab / EnrichmentMapApp

The EnrichmentMap Cytoscape App allows you to visualize the results of gene-set enrichment as a network.
http://apps.cytoscape.org/apps/enrichmentmap
GNU Lesser General Public License v2.1
31 stars 12 forks source link

Input files for EnrichmentMap #541

Open SchwarzLena opened 4 months ago

SchwarzLena commented 4 months ago

Dear all,

I'm currently analyzing snRNA-seq data, here we performed pseudobulk differential gene expression analysis (DESeq2) and are doing a GSEA using clusterProfiler afterwards.

In order to get a visual overview of our gene sets enriched, we decided to use Cytoscape's EnrichmentMap, inputting our GSEA results. There are 2 possibilities of files we can extract from the clusterProfiler:

1) an "_all" file that contains positive and negative enrichments compiled in one file 2) separated "_up" and "_down" files that have the pos./neg. enrichments separated into two files

Following your tutorial, we decided to use the _up & _down files separated as input for the EnrichmentMap. Interestingly, when using the _up & _down files as input, we are gaining more power in our enriched terms. What we can observe in that case, that enriched annotations are often intermingled in their scores (meaning the circles do show pos. and neg. NES at the same time). Moreover, we do gain more enriched annotations. Most often (but not always) this is due to the fact, that the list of pos. and neg. enrichments in the separated files are longer than in the compiled _all file. The most stringent (but not necessarily the best/ correct) is to use the file containing all enrichments together (_all file). Here, pos. and neg. enrichments are separated in the map, only showing annotation with positive or negative NES but rarely some intermingling. I am not sure which is the "correct" way to go on about this, and would very much appreciate your feedback and input.

Thank you for your help! Best, Lena

risserlin commented 4 months ago

Hi Lena, It is unclear what you are using for a ranked file in your GSEA analysis. I looked through the documentation for clusterProfiler but I can't find descriptions of the different outputs. Is the file marked _all created from a different rank file or are both analysis done with same rank file but just outputted differently? Does clusterProfile export GSEA type file output or are you using Generic format in Enrichment maps? It sounds a little like pos/neg results might be using a signed ranked list where as the all results are using an unsigned ranked list. If this is the case then it is not surprising that you are getting more power from the pos/neg ranked list.

SchwarzLena commented 4 months ago

Hi Ruth,

thank you for your help!

ClusterProfiler does not directly export the xls file format, which is required for the up/down GSEA terms. However, most columns are there and we were able to stitch the xls files together, meaning we do generate a GSEA type file.

We also think that we get more power with the separated up/down lists. What we are a bit unsure is if that is the correct way on how to split the genes for the up/down xls files. Have you ever done it similarly to our approach or how do you usually separate for the up/down lists?

risserlin commented 4 months ago

Hi Lena, I usually separate into up or down lists when I do thresholded analysis using g:profiler (or other thresholded methods). Are you using GSEA java desktop app or are you using a GSEA implementation in ClusterProfiler (or other R package)? With GSEA java desktop app even if you give it a uni-directional list, GSEA will still calculate enrichments from top and the bottom of the list. With the uni-directional list the bottom of that list doesn't mean anything. Are you using all the results from the two separate analyses or are you just grabbing the positive from the analysis of Up and just the negative from the analysis of the down? (If you are using all of them that might account for the conflicts that you see with sets having both positive and negative NES scores).
There is an option in Enrichment map to just use the positive results or just use the negative results as there is a use case for example when you have multiple conditions and you compare each condition to the rest and therefore are only interested in the condion and not the rest part. If I am understanding your set up correctly I think that you should be using "generation of _all GSEA list" with GSEA. If you want to a way to do the up and down separately (to compare to the GSEA results) I recommended creating two thresholded lists (one for the up regulated and one for down regulated) and running them through g:profiler to see how the results compare.