ctlab / fgsea

Fast Gene Set Enrichment Analysis
Other
379 stars 67 forks source link

Question: Geseca output, column "size" #152

Closed oswaldbra12 closed 7 months ago

oswaldbra12 commented 7 months ago

Hello,

Thank you for this. I think I have a simple question but can't seem to figure it out. One of the outputs of geseca is a data frame. In that data frame there is a "size" column. There is a number associated with the pathways analyzed. Are those the gene hits and if so, how can I pull those hits for my reference?

Example: Screenshot 2024-04-10 at 12 06 25 PM

Thanks.

assaron commented 7 months ago

Hi @oswaldbra12

size column has the same meaning as in fgsea(): it's the effective number of genes in the gen set, that is the size of intersection of the gene set and all of the genes in the input matrix. You can get these genes by doing the same intersection yourself.

Hope that helps.

oswaldbra12 commented 7 months ago

Thank you for the quick comment @assaron,

I figured that might be the case. I am having some trouble parsing out what those genes are in each geneset. Do you by chance have time to elaborate on how I can get those genes. For example: the best way to grab the 33 gene names from row number "4" in the screenshot above?

I'm just struggling with it a bit.

assaron commented 7 months ago

Something like this should work: unique(intersect(pathways[["NAME_OF_THE_PATHWAY"]], rownames(E))), assuming pathways is your list of pathways, E is the input expression matrix and NAME_OF_THE_PATHWAY is the name of the pathway you are interested in (I can't see it from the screenshot)

oswaldbra12 commented 7 months ago

This worked great. Thank you for helping me with this. Much appreciated!