Closed t-carroll closed 2 years ago
Hi Tom,
I have updated the github to address the issue of reproducibility. Simply reinstall the package and add seed=your seed number in the run.Ted argument.
Please let me know if you have any questions.
Best,
Tinyi
On Sat, Apr 9, 2022 at 6:53 PM Tom Carroll @.***> wrote:
Hi Tinyi,
I've had some issues generating reproducible results in preparation for publishing code, namely that I can't same to reproduce the same result, even when setting the seed explicitly. For instance, running the same call with the same seed on some previously published data from Maag et al. shows two different results:
set.seed(42) ted.maag = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA") set.seed(42) ted.maag2 = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA") all.equal(ted.maag$res$final.gibbs.theta,ted.maag2$res$final.gibbs.theta)
[1] "Mean relative difference: 0.001463005"
all.equal(ted.maag$res$final.gibbs.theta[,"EAC"],ted.maag2$res$final.gibbs.theta[,"EAC"]) [1] "Mean relative difference: 0.0006727291"
So it looks like setting the seed for the global R environment is insufficient, perhaps due to some quirk of the multicore parallelism (I'm doing this in Rstudio on a CentOS Linux HPC, if relevant). Is there a way to set the seed interanlly for whichever internal functions are sampling in order to try and enable reproducible results? If so, perhaps an optional seed argument could be passed to run.Ted(), which could then in turn be passed to these internal sampling functions. Do you think that sort of thing be feasible? Happy to try and help out if so (but not as familiar with the workings of these internal fucntions)
— Reply to this email directly, view it on GitHub https://github.com/Danko-Lab/TED/issues/17, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB4NHSYIJ7TPX6ZRIQPYJILVEIC6LANCNFSM5S75HEPA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Hi Tinyi,
Great, thanks for the quick upgrade! I tried it out and all.equal(ted.maag$res$final.gibbs.theta,ted.maag2$res$final.gibbs.theta)
now equals TRUE
when setting the same seed in both calls. Now closing.
-Tom
Hi Tinyi,
I've had some issues generating reproducible results in preparation for publishing code, namely that I can't same to reproduce the same result, even when setting the seed explicitly. For instance, running the same call with the same seed on some previously published data from Maag et al. shows two different results:
set.seed(42)
ted.maag = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA")
set.seed(42)
ted.maag2 = run.Ted(ref.dat = group, X = maag2,cell.type.labels = labs, cell.subtype.labels = subs, n.cores = 24, tum.key = "EAC", input.type = "scRNA")
all.equal(ted.maag$res$final.gibbs.theta,ted.maag2$res$final.gibbs.theta)
all.equal(ted.maag$res$final.gibbs.theta[,"EAC"],ted.maag2$res$final.gibbs.theta[,"EAC"])
So it looks like setting the seed for the global R environment is insufficient, perhaps due to some quirk of the multicore parallelism (I'm doing this in Rstudio on a CentOS Linux HPC, if relevant). Is there a way to set the seed internally for whichever internal functions are sampling in order to try and enable reproducible results? If so, perhaps an optional
seed
argument could be passed to run.Ted(), which could then in turn be passed to these internal sampling functions. Do you think that sort of thing be feasible? Happy to try and help out if so (but not as familiar with the workings of these internal fucntions)