Should the counts matrix (dataframe) be in the following format: rows = genes and columns = samples? Because if I do that, the ssgsea_scores() function does not work.
This is from the ssgsea_formula() function:
ranks = data.T.rank(method=rank_method, na_option='bottom')
data` -> rows = genes, columns = samples
data.T -> rows = samples, columns = genes
data.T.rank -> ranks.index = samples as rank(index=0) by default.
So is it correct to say that you need to use as input for ssgsea_formula() the counts_transformed with samples = rows and columns = genes (or of course remove the '.T' in the ssgsea_formula() itself?
versions of the packages I'm using: pandas==1.4.2 numpy==1.22.3
Hi,
I am trying to compute the gsea scores, using the following (similar to the given example code).
Read signatures
gmt = read_gene_sets('./signatures/gene_signatures.gmt') # GMT format like in MSIGdb
Read expressions
counts = pd.read_csv("../../Data/RNAseq/TCGA_tpm_LUAD.txt", sep="\t") counts_transformed = np.log2(counts + 1)
Calc signature scores
signature_scores = ssgsea_formula(counts_transformed, gmt)
Scale signatures
signature_scores = median_scale(signature_scores)
Should the counts matrix (dataframe) be in the following format: rows = genes and columns = samples? Because if I do that, the
ssgsea_scores()
function does not work.This is from the ssgsea_formula() function:
ranks = data.T.rank(method=rank_method, na_option='bottom')
data.T
-> rows = samples, columns = genesdata.T.rank
->ranks.index
= samples asrank(index=0)
by default.So is it correct to say that you need to use as input for
ssgsea_formula()
the counts_transformed with samples = rows and columns = genes (or of course remove the '.T' in thessgsea_formula()
itself?versions of the packages I'm using:
pandas==1.4.2 numpy==1.22.3