Closed conroylau closed 4 years ago
This is an interesting point. Your solution makes sense, but I think it's too elaborate and will cause difficulty down the line. In particular, I am not sure this is true!
The list of beta.obs returned should also be non-problematic for the subsequent iterations in the invertci procedure because the only thing that changes is beta.tgt. As long as the beta.tgt quantity is within the logical bounds, there should not be any problems.
In testing the fsst
procedure for example I have found that sometimes a draw that is problematic under one value of beta.tgt
is not problematic under another.
This is not unexpected since the linear program is all one piece.
Who knows what Gurobi is doing to it under the hood.
Generally we want to avoid setting the seed ourselves. But maybe this is one case where it is necessary?
A simple solution to the problem would be to add an optional parameter seed = NULL
to invertci
.
If it's NULL
then we don't do anything and just live with the problem that the CI might be strange due to randomness in the bootstrap draws.
If it's an integer, then we call set.seed(seed)
before each iteration of do.call
.
This will ensure the same bootstrap draws are used each iteration.
What do you think?
I see. I thought that the problem always come from the other constraints (such as the ones corresponding to beta.obs
). Hence, I have made that statement in the post.
So I agree it is actually not a good idea to evaluate all the stochastic components in advance because it might create another source of error if some beta.tgt
are causing troubles.
Actually, I find a better solution to solve the problem:
.Random.seed
in the beginning of the invertci
procedure.do.call
, set the seed as the RNG state.This is illustrated by the following:
set.seed(1)
a <- .Random.seed
set.seed(a)
runif(5)
#> [1] 0.7330424 0.8172787 0.8308336 0.6205377 0.1086965
set.seed(a)
runif(5)
#> [1]. 0.7330424 0.8172787 0.8308336 0.6205377 0.1086965
It seems to also work well when future_lapply
is used:
set.seed(a)
unlist(future.apply::future_lapply(1:2, FUN = runif, future.seed = TRUE))
#> [1] 0.03632143 0.70247464 0.72824333
set.seed(a)
unlist(future.apply::future_lapply(1:2, FUN = runif, future.seed = TRUE))
#> [1] 0.03632143 0.70247464 0.72824333
In this way, we can avoid asking the seed
from the user as in the other procedures.
Do you think this is fine?
Thanks!
I think that's a good solution as long as it is safe and doesn't have any unintended consequences.
So if the user does
runif(1)
invertci(...)
runif(1) # this should be different than the first one
and
set.seed(1)
runif(1)
invertci(...)
set.seed(1)
runif(1) # this should be the same as the first one
We had a bad experience with seeds in ivmte https://github.com/jkcshea/ivmte/issues/187 https://github.com/jkcshea/ivmte/pull/188
So I want to make sure we don't have something similar going on here
I see. In the following sample code, the behavior is aligned with what you expect above.
In the invertci_toy
function below, I include the procedure of extracting the seed and setting the seed before each iteration, just like what I intend to do in the invertci
function.
invertci_toy <- function(iter) {
# Obtain the RNG state in the beginning
s <- .Random.seed
x <- NULL
# Iterate (like the bisection method)
for (i in 1:iter) {
# Set seed before each iteration
assign(x = ".Random.seed", value = s, envir = .GlobalEnv)
x <- c(x, runif(1))
}
print(x)
}
Then, the following code:
print(runif(1))
invertci_toy(5)
print(runif(1))
gives the following output, where
invertci
).
[1] 0.3721239
[1] 0.5728534 0.5728534 0.5728534 0.5728534 0.5728534
[1] 0.9082078
On the other hand, the following code:
set.seed(1)
print(runif(1))
invertci_toy(5)
set.seed(1)
print(runif(1))
gives the following output, where
[1] 0.2655087
[1] 0.3721239 0.3721239 0.3721239 0.3721239 0.3721239
[1] 0.2655087
In addition, by setting seed before invertci_toy
, we can get reproducible results.
For instance, the following code
set.seed(12345)
invertci_toy(5)
runif(5) # Run something else
set.seed(12345)
invertci_toy(5)
gives:
[1] 0.7209039 0.7209039 0.7209039 0.7209039 0.7209039
[1] 0.7209039 0.7209039 0.7209039 0.7209039 0.7209039
So it is not messing up the seed as in #82 because I am not changing the RNGkind
.
Ok, I'm convinced! Looks like a great solution.
Great! I will go ahead and implement this method and see it works well as expected. Thanks!
Done! Just updated the invertci
procedure.
I have ran some tests as follows to make sure it is not messing up with the seed. The set-up of the test is as follows (basically same as in #85).
devtools::load_all()
source("../lpinfer/example/dgp_mixedlogit.R")
dgp <- mixedlogit_dgp()
set.seed(1)
data <- mixedlogit_draw(dgp, n = 2000)
lpm <- lpmodel(A.obs = mixedlogit_Aobs(dgp),
beta.obs = function(d) mixedlogit_betaobs(d, dgp),
A.shp = rep(1, nrow(dgp$vdist)),
beta.shp = 1,
A.tgt = mixedlogit_Atgt_dfelast(dgp, w2eval = 1, eeval = -1))
invertci.arg <- list(f = fsst,
farg = list(lpm = lpm,
data = data,
R = 200),
lb0 = 0,
ub0 = 1,
progress = FALSE)
The result is reproducible:
set.seed(1)
a <- do.call(invertci, invertci.arg)
print(a)
#> Confidence interval: [0.07632, 0.83835]
set.seed(1)
b <- do.call(invertci, invertci.arg)
print(b)
#> Confidence interval: [0.07632, 0.83835]
The behavior of the random number generated looks like it is not messing up with the seed.
p
and r
are the same with set.seed(1)
when they are generated before and after invertci
.
set.seed(1)
p <- runif(1)
print(p)
#> [1] 0.2655087
q <- do.call(invertci, invertci.arg)
set.seed(1)
r <- runif(1)
print(r)
#> [1] 0.2655087
x
and z
are different without setting the seed when they are generated before and after invertci
.
x <- runif(1)
print(x)
#> [1] 0.3721239
y <- do.call(invertci, invertci.arg)
z <- runif(1)
print(z)
#> [1] 0.754191
invertci
procedure with the same seed.
For instance, based on the details extracted from the summary of a
, the p-value when beta.tgt = .09375
is 0.035
.
(I am skipping many rows as the table is having too many rows).
Iteration Lower bound Upper bound Test point p-value Reject?
Left end pt. 0.00000 NA 0.00000 0.00000 TRUE
Right end pt. NA 1.00000 1.00000 0.00000 TRUE
...
5 0.06250 0.12500 0.09375 0.03500 FALSE
...
Running the fsst
procedure directly yields the same result:
set.seed(1)
fsst(data, lpm, beta.tgt = 0.09375, R = 200)
#> p-value (by data-driven 'lambda'): 0.035
Great work!
In the current design of
invertci
, it is calling the testing procedure in each iteration bydo.call
. Although the user will set the seed before callinginvertci
, I think each iteration in the bisection method is using different bootstrap data to compute the p-value. Hence, the result obtained might be inaccurate. My solution to the problem is as follows:lpmodel
.lpmodel
object by the new list.lpmodel
object is a list or is deterministic, then there is nothing to do.lpmodel
object via the new list.A potential problem that I can think of is:
lpmodel
in advance, then I do not know whether there will be some problematic bootstrap draws.beta.obs
that is returning an error in the optimization problems.Hence, I am thinking instead of evaluating all
beta.obs
before running the procedure, I can return the non-problematic bootstrapbeta.obs
draws from the testing procedures. In this way, I can collect them after running the procedure for the first time and create a list of bootstrapbeta.obs
. The list ofbeta.obs
returned should also be non-problematic for the subsequent iterations in theinvertci
procedure because the only thing that changes isbeta.tgt
. As long as thebeta.tgt
quantity is within the logical bounds, there should not be any problems.I am also thinking of adding an option like
return.stochastic.lpm
in the procedures, where setting the option asTRUE
means returning the list of non-problematic bootstrap draws for the component in thelpmodel
object that is a function. This is because the list returned can be large and use up a lot of memory, so I will set it asFALSE
as default.