FertigLab / CoGAPS

Bayesian MCMC matrix factorization algorithm
https://www.bioconductor.org/packages/release/bioc/html/CoGAPS.html
BSD 3-Clause "New" or "Revised" License
61 stars 17 forks source link

getPatternGeneSet gives wrong gene.set names #107

Closed JamesFrifan closed 3 months ago

JamesFrifan commented 3 months ago

Hi,

I am trying to run the function getPatternGeneSet on my CoGAPS results using the hallmark gene list from msigdbr, as demonstrated in the following code:

hallmark_df <- msigdbr(species = "Homo sapiens", category = "H")  
hallmark_list <- hallmark_df %>% split(x = .$gene_symbol,  f = .$gs_name)  
hallmarks_ora <- getPatternGeneSet(cogaps_result, gene.sets = hallmark_list, method = "enrichment", threshold = "cut", lp = NA, axis = 1)

When looking at the gene set testing result for a given pattern (e.g. View(hallmarks_ora[[1]])), I found the pathway and gene.set columns are mismatched which I think is not an intended behavior. Please see the snapshot below: image

By looking at the code, I found the issue may be that fgsea functions return sorted results on the padj column rather than keeping the same order as the input gene.set. As a result, directly setting the gene set names from the input names will cause a mismatch. image

I hope the description above is clear and should be helpful for you.

ejfertig commented 3 months ago

Thank you so much for catching this! Our team will get on fixing this immediately.

Get Outlook for iOShttps://aka.ms/o0ukef


From: James Fu @.> Sent: Wednesday, May 29, 2024 2:45:01 AM To: FertigLab/CoGAPS @.> Cc: Subscribed @.***> Subject: [FertigLab/CoGAPS] getPatternGeneSet gives wrong gene.set names (Issue #107)

Hi,

I am trying to run the function getPatternGeneSet on my CoGAPS results using the hallmark gene list from msigdbr, as demonstrated in the following code:

hallmark_df <- msigdbr(species = "Homo sapiens", category = "H") hallmark_list <- hallmark_df %>% split(x = .$gene_symbol, f = .$gs_name) hallmarks_ora <- getPatternGeneSet(cogaps_result, gene.sets = hallmark_list, method = "enrichment", threshold = "cut", lp = NA, axis = 1)

When looking at the gene set testing result for a given pattern (e.g. View(hallmarks_ora[[1]])), I found the pathway and gene.set columns are mismatched which I think is not an intended behavior. Please see the snapshot below: image.png (view on web)https://github.com/FertigLab/CoGAPS/assets/48193804/ec52b7b4-fe77-4fc3-9584-7f371e89c9b1

By looking at the code, I found the issue may be that fgsea functions return sorted results on the padj column rather than keeping the same order as the input gene.set. As a result, directly setting the gene set names from the input names will cause a mismatch. image.png (view on web)https://github.com/FertigLab/CoGAPS/assets/48193804/86198d5e-b2de-4ed5-8666-7ef27842bde8

I hope the description above is clear and should be helpful for you.

— Reply to this email directly, view it on GitHubhttps://github.com/FertigLab/CoGAPS/issues/107, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AATMMKZ6TRDXLC55QEH3MULZEV2O3AVCNFSM6AAAAABIOIXJ72VHI2DSMVQWIX3LMV43ASLTON2WKOZSGMZDENJQGA4TOMQ. You are receiving this because you are subscribed to this thread.Message ID: @.***>

dimalvovs commented 3 months ago

Thanks @JamesFrifan, can you please share your sessionInfo() ?

dimalvovs commented 3 months ago

Bug confirmed and did not happen on test data as padj was tied in the test data. To reproduce:

gs.test <- list(
    #add a hallmark with lower p-adj:
    "low_p" = c("Hs.101174", "Hs.1012", "Hs.1019", "Hs.101937", "Hs.102484", "Hs.103110"),
    "gs1" = c("Hs.2", "Hs.4", "Hs.36", "Hs.96", "Hs.202"),
    "gs2" = c("Hs.699463", "Hs.699288", "Hs.699280", "Hs.699154", "Hs.697294")
)
hallmarks_ora <- getPatternGeneSet(object = GIST.result, gene.sets = gs.test, method = "enrichment")
> print(hallmarks_ora[[1]][,c(1,10)])
   pathway gene.set
    <char>   <fctr>
1:     gs1    low_p
2:     gs2      gs1
3:   low_p      gs2
dimalvovs commented 3 months ago

@jmitchell81