Closed a-torgovitsky closed 4 years ago
Done! bs_seed
is now removed from the module.
I have made a few changes other than only removing the argument bs_seed
from the function. Here are some details about the modification:
Originally, I set the seed inside the for-loop that does the bootstrap, where the seed is dependent on the loop number. The reason that I set it to be dependent on the loop number is because in parallel programming, the following sample code is not reproducible:
library(foreach)
set.seed(1)
x = foreach(i = 1:10) %dopar% {
x = rand(1)
}
while the following code is reproducible:
x = foreach(i = 1:10) %dopar% {
set.seed(i)
x = rand(1)
}
It has to be dependent on the loop number or varying, because if I fix it to a certain number, all draws will be the same:
x = foreach(i = 1:10) %dopar% {
set.seed(1)
x = rand(1)
}
To help our functions to be better specialized and have fewer inputs, I find that there is a useful package doRnG
which can help to generate reproducible results in parallel programming with set.seed()
being placed outside the for-loop. Thus, the first change that I made is to change %dopar%
to %dorng%
.
Then, I can set the seed to somewhere outside the for-loop that does the bootstrap in both the parallel programming and the usual case. In this way, I can also compare whether setting the seed in the beginning of the function or outside the function gives the same results.
I have checked that they give the same results for both the parallel programming case and the usual case. Here is an example that contains the p-value for the dkqs
function for the first 100 seeds, which shows that the result is the same if we drop the bs_seed
in the function and ask the user to set the seed outside of the function.
Great!
Could you also check that %dorng%
is roughly as fast as %dopar%
?
I just tested the computational time using the dkqs
procedure. I did 1,000 simulations and have 1,000 bootstrap replications for each simulation. I used 8 cores.
The %dorng%
version takes 35.87693 mins and the %dopar%
version takes 33.83579 mins. It looks like their performance is similar but %dorng%
is slower.
Ok, here's a better way to benchmark though. Try a faster computation (reduce the number of simlutions and/or bootstraps). Then repeat it many times and time how long it takes each repetition for each method. You will see that there is a distribution.
By just doing it once you are getting only a single draw from the distribution.
I see. Do you mean that I can try to do maybe 50 simulations and 50 bootstraps (which takes around 10 seconds to complete), then record the time, repeat this procedure for N times and see the distribution of the time taken? Thanks!
Yes exactly. This is because there is often a surprising amount of variability in how long computations take (due to other processes running, etc.)
I see, this is true because I am running other applications (such as Stata) on my computer at the same time so the difference in the time that I have posted earlier might be affected by some other applications instead of being the the differences in the two commands.
I will do the simulation accordingly to observe the distribution.
Just completed the simulation.
For both dorng
and dopar
, I conduct 200 replications, where each replication consists of 50 bootstraps and 50 simulations using the dkqs
function. I used 8 cores in my machine.
The result is as follows - where it looks like the method of using dorng
and setting the seed outside dkqs
is performing better than using dopar
and setting the seed inside dkqs
. There is an outlier in dorng
which takes 50 seconds though, but I guess the long computational time is unrelated to the dorng
.
That looks great! So there's no reason not to prefer %dorng%
-- perfect!
Is this complete?
Yes, its done!
seems like a more sensible way to do things, and then our functions are better specialized and have fewer inputs.
Please confirm that setting the seed before running something like
dkqs
gives the same results as the current ones where the seed is set insidedkqs