Closed conroylau closed 3 years ago
Is there anything currently in invertci
that mentions a specific testing procedure?
It would be nice if we could keep it procedure-invariant.
Of course this requires each procedure having predictable and harmonized output.
Currently, only chorussell
is mentioned in invertci
.
The reason of including it is to prevent the code from doing the bisection method to construct the confidence interval.
I see, so that must mean that invertci
expects a testing procedure to return an output that is standardized across procedures.
Is that correct?
What does this output look like?
Yes that's correct.
invertci
is extracting the p-values via the pval
object in the return list of the testing procedures.
It is a data frame where the first column contains the tuning parameter and the second column contains the p-value.
The number of rows of the data frame refers to the number of tuning parameters used.
For example:
rm(list = ls())
devtools::load_all()
source("../lpinfer/example/dgp_missingdata.R")
set.seed(1)
data <- missingdata_draw(J = 5, n = 500, seed = 1)
lpm <- missingdata_lpm(J = 5, info = "mean", data = data)
r <- dkqs(data, lpm, beta.tgt = .5, tau = c(.1, .5))
print(r$pval)
gives the following table of p-values:
tau p-value
1 0.1 0.65
2 0.5 0.69
Currently, I am extracting the [1,2]
-entry of the pval
object from the return list for the p-value when there is only one tuning parameter.
Two ideas, one easy and one harder:
Only compute a confidence interval for the first row. Warn the user that this is what is being done and that it would be computationally faster to pass one tuning parameter at a time.
tau = .1
.tau = .5
. Use that information to set a good bracket for tau = .5
tau = .5
starting with the good bracket determined in step 2.The "harder" solution is still not a great I guess because there is some work being repeated.
i.e. when computing the confidence interval for tau = .5
we are testing tau = .1
points for which we already know the answer.
What do you think?
I see. I am thinking of extending the harder solution as follows:
pval
that is returned from the testing procedure.
This consists of the name of the tuning parameter as well as the p-values for each value of the tuning parameter.tau = .1
by modifying the farg
argument (say by farg$tau <- .1
).
This should be possible because we know the name of the tuning variable from the pval
data frame.
In addition, this step can be skipped if pval
only has one row because there is only one tuning parameter.tau = .1
is constructed, go to the next parameter tau = .5
.In this way, I think we will not be repeating work because except for the initial step, we are only computing the p-values for a single tuning parameter. Do you think this is feasible? Thanks!
Sounds feasible to me.
But what if we have a procedure with multiple tuning parameters?
Should we decide that all columns of pval
besides the last correspond to tuning parameters?
These types of issues interact with the long-term goal of the module a bit.
It would be nice in the future to be able to add procedures without having to think about how they are going to interact with the rest of the structure (lpmodel
, invertci
, etc.).
So we should have some clear input/output rules that any new procedure has to follow.
And we need to make these sufficiently general so that we can handle unknown procedures we come across in the future.
Sure, I think it is a good idea to have a rule where all columns of pval
besides the last column are tuning parameters.
The column names except for the last one should be the name of the tuning parameters.
I will update this in the part that computes the p-value in invertci
.
I will also check the other parts of invertci
to see if they are general enough for us to add new procedures in the future.
Thanks!
Ok sounds like a plan. Thanks!
I have just updated the invertci
procedure where it can now construct confidence intervals when there are multiple parameters.
To illustrate that it works for procedures with multiple tuning parameters, I have created the toy procedure below.
In this procedure, there are two tuning parameters beta
and gamma
.
They can both accept multiple values from the user.
For simplicity, I am skipping the data
and lpmodel
part of the procedure.
eg_procedure <- function(beta.tgt, beta, gamma) {
# Data frame that contains the p-values
df <- data.frame(matrix(vector(),
nrow = length(beta) * length(gamma),
ncol = 3))
colnames(df) <- c("beta", "gamma", "p-value")
# Assign the p-values
k <- 1
for (i in seq_along(beta)) {
for (j in seq_along(gamma)) {
df[k, 1] <- beta[i]
df[k, 2] <- gamma[j]
df[k, 3] <- max(abs(beta.tgt - beta * 5 + gamma * 2), 1)
k <- k + 1
}
}
return(list(pval = df,
logical.lb = 0,
logical.ub = 1))
}
As discussed above, it is returning a data frame pval
that has three columns:
Then, I can use the above procedure with invertci
to construct confidence intervals.
As with the original design of invertci
, users can pass multiple alpha
to the procedure.
The following code
library(lpinfer)
set.seed(1)
r <- invertci(f = eg_procedure,
farg = list(beta = c(.1, .3),
gamma = c(.1, .5)),
lb0 = 0,
lb1 = .5,
ub1 = .6,
ub0 = 1,
max.iter = 5,
alpha = c(.9, .95),
progress = FALSE)
print(r)
gives:
Confidence intervals:
Significance level beta gamma Confidence interval
0.90 0.1 0.1 [0.4921875, 0.60625]
0.90 0.1 0.5 [0.0078125, 0.99375]
0.90 0.3 0.1 [0.4921875, 0.60625]
0.90 0.3 0.5 [0.0078125, 0.99375]
0.95 0.1 0.1 [0.4921875, 0.60625]
0.95 0.1 0.5 [0.0078125, 0.99375]
0.95 0.3 0.1 [0.4921875, 0.60625]
0.95 0.3 0.5 [0.0078125, 0.99375]
During the iterations, the code will also show the parameters that are being used, such as the following excerpt from the output:
…
********** Constructing confidence interval for significance level = 0.95 **********
Parameters: beta gamma
0.3 0.5
=== Computing upper bound of confidence interval ===
Iteration Lower bound Upper bound Test point p-value Reject?
Left end pt. 0.60000 NA 0.60000 0.50000 FALSE
Right end pt. NA 1.00000 1.00000 0.50000 FALSE
1 0.60000 1.00000 0.80000 0.50000 FALSE
…
This is very good, nice job!
Can you record your example in the examples
folder?
This may be useful in the future as an example of how to code in additional procedures.
Thank you!
Sure, I have included the extra example under the example
folder.
I am updating an optional input from the user df_ci
that is related to this issue.
This table contains the p-values that correspond to different beta.tgt
that were pre-evaluated by the user.
This object is also returned from the invertci
procedure, where it includes the p-values computed inside the procedure.
I have added a requirement that it needs to contain the value of the tuning parameters which correspond to each p-values and beta.tgt
.
This is because invertci
can now take care of the procedures with multiple tuning parameters.
Now the first n columns of the table corresponds to the n tuning parameters of the procedure.
The last two rows are the beta.tgt
and p-value
.
Do you think I should rename the df_ci
object as pvals
?
I am thinking if it will give a more informative name because it actually contains the p-values at different beta.tgt
Since we have added the table ci
that stores the confidence intervals in #96, I think using the name df_ci
might be misleading too.
Thanks!
Good point. I think pvals
is a good name. I also think we want to make the name of the returned object the same as the name of the passed object to avoid confusion.
Sure, no problem. Thanks!
I'm not sure if this is working. For example, I would expect this would work:
library("lpinfer")
source("~/R/x86_64-pc-linux-gnu-library/3.6/lpinfer/example/dgp_mixedlogit.R")
set.seed(5)
dgp <- mixedlogit_dgp()
df <- mixedlogit_draw(dgp, n = 4000)
lpm <- lpmodel(A.obs = mixedlogit_Aobs(dgp),
beta.obs = function(d) mixedlogit_betaobs(d, dgp),
A.shp = rep(1, nrow(dgp$vdist)),
beta.shp = 1,
A.tgt = mixedlogit_Atgt_dfelast(dgp, w2eval = 1, eeval = -1))
r <- invertci(f = fsst, farg = list(lpmodel = lpm, data = df))
r <- invertci(f = fsst, farg = list(lpmodel = lpm, data = df), alpha = .10,
pvals = r$pvals)
but I get
Error: The column names of the data frame 'pvals' need to contain 'point'
and 'p-value'.
Done! I have fixed a bug when the code is checking the input of pvals
.
The confidence interval for the first r
is [0.2139, 0.77786]
.
The confidence interval for the second r
is [0.21866, 0.76779]
.
(If I put set.seed(5)
before the second r
, it gives [0.22623, 0.77493]
).
Thanks!
Ok, now it certainly works.
Can you explain what is going on inside? The display looks the same but I'm not sure what's happening computationally. Are points being skipped if the pval is already known?
Yes, the computation will be skipped if pval
is already known for the given parameters.
For instance, the bisection in the second r
will be reusing the p-values from pvals
.
But we still need to set the seed here to make sure the same lambda
is used so that the p-values will be recycled.
set.seed(12345)
r <- invertci(f = fsst, farg = list(lpmodel = lpm, data = df))
set.seed(12345)
r <- invertci(f = fsst, farg = list(lpmodel = lpm, data = df), pvals = r$pvals)
Thanks!
I see, my mistake forgetting to set the seed. Thanks for the clarification!
No problem!
In several procedures, we allow the user to pass multiple tuning parameters. For instance, users can pass multiple
tau
parameters in the DKQS procedure. This could be problematic for theinvertci
function if we allow them to do so. The reason is that we are finding the p-value for differentbeta.tgt
in order to construct the confidence interval. If we allow for multiple tuning parameters, then each parameter could give a different p-value for the samebeta.tgt
. Then this means that the decision to reject or not to reject at a certainbeta.tgt
could be different for different tuning parameters. I cannot think of a perfect solution to this problem yet, but I have an idea as follows:lambda
in FSST andtau
in DKQS).farg
list) in the beginning ofinvertci
.As a side note, I think the procedure of evaluating all the bootstrap replications of a stochastic component in
lpmodel
that is a function (as mentioned in issue #88) can be done only once even if the user passes multiple tuning parameters. This is because the LP of finding the logical bounds is independent on the tuning parameter.