Open guidohooiveld opened 7 years ago
I would expect some difference due to randomness of the algorithms. You should get similar difference between different runs of the same algorithm. You'd better compare p-values of the same pathway.
The difference should be smaller if you increase number of permutations. I'd recommend to use at least 10000.
On Mon, Nov 14, 2016, 13:26 Guido Hooiveld notifications@github.com wrote:
Hi, I don't know whether I should ask here, or at the maintainer of fgsea (@assaron https://github.com/assaron), but after analyzing the example DOSE dataset using the GSEA algorithm implemented in either DOSE or fgsea, I noticed the results are IMO quite different. I realize that the results are permutation based, and that the results are therefore expected not to be 100% identical, but i find the observed differences quite large. Could you maybe comment on this whether this is expected? Thanks, Guido
Full code below, but main outcome: gseDO(geneList,by="DOSE") identified 162 different gene sets gseDO(geneList,by="fgsea") identified 149 different gene sets
Overlap is only 138! Unique for 'DOSE' are 24 gene sets, for 'fgsea' 11.
Also the top regulated Gene Sets are completely different... ??
As said before, I expected some differences, but NOT because 2 different implementations of the same algorithm were used, but rather because of the permutations (that are random).
packageVersion("DOSE") [1] ‘3.0.4’
library(DOSE) data(geneList) x.DOSE <- gseDO(geneList,by="DOSE") [1] "preparing geneSet collections..." [1] "calculating observed enrichment scores..." [1] "calculating permutation scores..." [1] "calculating p values..." [1] "leading edge analysis..." [1] "done..." head(x.DOSE, 5) ID Description setSize DOID:5679 DOID:5679 retinal disease 299 DOID:0060040 DOID:0060040 pervasive developmental disorder 175 DOID:127 DOID:127 leiomyoma 89 DOID:9884 DOID:9884 muscular dystrophy 90 DOID:0060086 DOID:0060086 female reproductive organ benign neoplasm 42 enrichmentScore NES pvalue p.adjust qvalues rank DOID:5679 -0.3676313 -1.573945 0.001308901 0.03271556 0.02193074 1768 DOID:0060040 -0.3909104 -1.590133 0.001418440 0.03271556 0.02193074 2308 DOID:127 -0.4491341 -1.663010 0.001506024 0.03271556 0.02193074 2237 DOID:9884 -0.4623732 -1.713722 0.001515152 0.03271556 0.02193074 1971 DOID:0060086 -0.5530173 -1.780920 0.001636661 0.03271556 0.02193074 1674
DOID:5679 tags=24%, list=14%, signal=21% DOID:0060040 tags=30%, list=18%, signal=25% DOID:127 tags=40%, list=18%, signal=33% DOID:9884 tags=31%, list=16%, signal=26% DOID:0060086 tags=45%, list=13%, signal=39% core_enrichment DOID:5679 2878/3791/23247/80184/6750/7450/596/9187/2034/482/948/1490/1280/5737/4314/4881/3426/187/629/6403/6785/2934/5176/7078/5950/727/10516/4311/2247/1295/358/10203/582/10218/57125/585/1675/6310/2202/4313/2944/4254/3075/2099/3480/4653/6387/1471/857/4016/1909/4053/6678/1296/4915/55812/1191/5654/10631/2697/2952/6935/2200/3479/2006/10451/9370/771/652/4693/5346/1524 DOID:0060040 1760/9732/7337/5175/6532/54806/6326/1499/7157/221037/627/3399/2571/3082/3791/27347/596/22829/23426/324/5021/4885/7248/8604/3397/4208/3400/26470/64221/9037/3952/93664/2944/7102/2550/4915/4922/26960/1746/2697/6863/2891/367/4128/7166/6505/5348/18/9370/57502/79083 DOID:127 5175/5595/7157/1288/1543/1027/1958/7450/596/947/595/2246/3912/7507/2308/1277/8991/2247/5468/6469/81029/4313/56034/2944/2099/3480/1462/1289/5744/7043/367/1287/4582/3479/5241 DOID:9884 825/8106/1291/3643/7402/7450/596/1490/8082/2817/284119/54843/1293/23345/3952/4313/10468/6444/9499/2273/3339/1292/3908/81493/3679/3708/4137 DOID:0060086 1958/7450/596/947/595/3912/1277/8991/3952/4313/2099/3480/1462/1289/7043/367/3479/5241
x.fgsea <- gseDO(geneList,by="fgsea") [1] "preparing geneSet collections..." [1] "GSEA analysis..." [1] "leading edge analysis..." [1] "done..." head(x.fgsea, 5) ID Description setSize enrichmentScore NES DOID:9884 DOID:9884 muscular dystrophy 90 -0.4623732 -1.709404 DOID:14221 DOID:14221 metabolic syndrome X 22 -0.7194112 -2.018608 DOID:13359 DOID:13359 Ehlers-Danlos syndrome 11 -0.7647954 -1.784188 DOID:3382 DOID:3382 liposarcoma 12 0.7050486 1.825362 DOID:3939 DOID:3939 lipomatous cancer 12 0.7050486 1.825362 pvalue p.adjust qvalues rank leading_edge DOID:9884 0.001494768 0.03715567 0.0247576 1971 tags=31%, list=16%, signal=26% DOID:14221 0.001626016 0.03715567 0.0247576 894 tags=36%, list=7%, signal=34% DOID:13359 0.001773050 0.03715567 0.0247576 1023 tags=55%, list=8%, signal=50% DOID:3382 0.002304147 0.03715567 0.0247576 1022 tags=42%, list=8%, signal=38% DOID:3939 0.002304147 0.03715567 0.0247576 1022 tags=42%, list=8%, signal=38% core_enrichment DOID:9884 825/8106/1291/3643/7402/7450/596/1490/8082/2817/284119/54843/1293/23345/3952/4313/10468/6444/9499/2273/3339/1292/3908/81493/3679/3708/4137 DOID:14221 5468/3952/2152/185/1489/3479/9370 DOID:13359 1277/1281/1290/1289/1634 DOID:3382 7153/1029/1019/7545/10644 DOID:3939 7153/1029/1019/7545/10644
DOSE.res <- rownames(as.data.frame(x.DOSE)) fgsea.res <- rownames(as.data.frame(x.fgsea))
combined.ID <- union(DOSE.res,fgsea.res) C <- cbind(combined.ID %in% DOSE.res, combined.ID %in% fgsea.res) colnames(C) <- c("DOSE.res","fgsea.res")
library(limma) vennCounts(C) DOSE.res fgsea.res Counts 1 0 0 0 2 0 1 11 3 1 0 24 4 1 1 138 attr(,"class") [1] "VennCounts"
vennDiagram(vennCounts(C))
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GuangchuangYu/DOSE/issues/18, or mute the thread https://github.com/notifications/unsubscribe-auth/AA_fWaI33fSKVtQvxQSdQuutwCkLocSaks5q-DdSgaJpZM4KxKYS .
Thanks Alexey for your comments.
If I repeatedly run GSEA either using the DOSE
or fgsea
algorithmic implementation, I noticed that for each of them the overlap of repeated analyses is quite OK. However, the "between-implementation" variation is (much) larger... I'll do some more studies and report back.
G
I have a similar problem but for me the gsea results change run by run. The pathways who were on the top of list are not the same when I re-run . What did I wrong ? Thanks.
Hi, I don't know whether I should ask here, or at the maintainer of
fgsea
(@assaron), but after analyzing the example DOSE dataset using the GSEA algorithm implemented in eitherDOSE
orfgsea
, I noticed the results are IMO quite different. I realize that the results are permutation based, and that the results are therefore expected not to be 100% identical, but i find the observed differences quite large. Could you maybe comment on this whether this is expected? Thanks, GuidoFull code below, but main outcome:
gseDO(geneList,by="DOSE")
identified 162 significantly regulated gene setsgseDO(geneList,by="fgsea")
identified 149 significantly regulated gene setsBut overlap is only 138! Unique for 'DOSE' are 24 gene sets, for 'fgsea' 11.
Also the top regulated Gene Sets are completely different... ??
As said before, I expected some differences, but NOT because 2 different implementations of the same algorithm were used, but rather because of the permutations (that are random).