GSEA-MSigDB / GSEA_R

Updated implementation of the GSEA-P R application for modern R distributions
Other
94 stars 36 forks source link

Input file vs CLS #8

Open abbaslab opened 3 years ago

abbaslab commented 3 years ago

Trying to use GSEA function. My input is a .rnk file of log2 genes (all genes, ~ to what is done GSEA Java-based link) and a gmt file for pathways.

GSEA(input.ds = system.file('extdata', 'test.rnk', package = 'GSEA', mustWork = TRUE),
     input.cls = NA,
     gs.db = system.file('extdata',  'h.all.v7.0.symbols.gmt', package = 'GSEA', mustWork = TRUE),
     collapse.dataset = FALSE, collapse.mode = 'max', gsea.type = "preranked") #changed to reranked

The error I get:

Error in loc.vector[gene.list] <- seq(1, N) : NAs are not allowed in subscripted assignments

I am unable to figure out why. by rnk file doesn't have NA.

Also, I am unable to figure out GSEA run if a matrix of genes/samples are used instead. Is it calculating differential gene expression then from those select genes for GSEA?

ACastanza commented 3 years ago

Hello, Looking through the code it appears to me that what's happening is that the gene symbols in your gene list don't match the HGNC Gene Symbols we've used in the .GMT file. Could you send a screenshot of the first couple rows of your RNK file? If this is the case, then you're going to need to configure the collapse.dataset parameters with the appropriate chip file as well.

Another possibility is that something is being passed incorrectly to the GSEA command line causing an inconsistent state. in order to aid configuration of the correct parameters, we've included an interactive helper script that should set the required parameters for your selected GSEA mode. You can activate that script using: source(system.file('extdata', 'Run.GSEA.R', package = 'GSEA')) And it should walk you through configuration of the command line.

If you still encounter an error with your files after checking the collapse settings and using the configuration script let me know and I can dig deeper into the function to see where this might've gone wrong.

Its worth noting that GSEA preranked is something that was recently added to this package as a backport for reference, and is considered unsupported. For your mission critical applications I would strongly recommend using the standard GSEA Desktop application and not the R Package.

-Anthony

Anthony S. Castanza, PhD Curator, Molecular Signatures Database Mesirov Lab, Department of Medicine University of California, San Diego http://gsea-msigdb.org/

ACastanza commented 3 years ago

Oh! In answering the part of your question I missed about gene ranking in GSEA, I realized what's actually happening here! For GSEA-Preranked you need to supply the ENTIRE list of all genes ranked by differential expression, not just the significantly differential expressed genes.

In the case of supplying a matrix of genes/samples, yes, GSEA will compute a metric of differential expression (which metric specifically depends on the user's configuration options, but the standard is signal-to-noise ratio.) and then use that computed gene ranking to perform it's statistical tests.