Closed a-torgovitsky closed 4 years ago
Also, I should be able to call print
on an lpmodel
and get back some nice display that shows me what the underlying LP looks like in some informative way. I'm not exactly sure what the most informative way of doing this is, but take a stab at it and we will iterate a few times.
One other thing here:
A beta.obs vector A function func.obs that takes can take a dataframe as an input This function always returns a beta.obs vector It optionally also returns an asymptotic variance-covariance matrix associated with beta.obs Give the user flexibility in how they specify the function --- just make sure it returns one vector and one matrix, but don't force them to use specific naming schemes
Can we combine this all into beta.obs
?
So that beta.obs
is either:
And maybe also give the user the option to pass bootstrap draws directly, so that beta.obs
could also be a list of vectors (or a matrix), with the first one interpreted as the sample estimate, and the following ones interpreted as bootstrap draws of that estimate.
Also, I guess it doesn't make much sense to have beta.tgt
part of the lpmodel
object, since beta.tgt
is a specific value we want to test, and not a part of the model per se.
Sure, I will separate beta.tgt
out. I agree that it is better since we are testing beta.tgt
so separating it from lpmodel
can prevent us from changing lpmodel
every time in case we are testing several beta.tgt
.
Regarding beta.obs
, I am thinking whether it is ideal to make it either a vector (that comes from passing data frame to the function) or a function. For instance, func.obs
is needed inside the bootstrap procedure of the dkqs
function. Thus, I am thinking if we should keep func.obs
or beta.obs
separate, or we can require beta.obs
to be a function inside procedures like dkqs
.
To pass the bootstrap draws directly, may I know does it mean that beta.obs
be a list, where the first element is the point estimate, and the second element is perhaps a data frame of the bootstrap betas?
Thank you!
or we can require beta.obs to be a function inside procedures like dkqs.
That is preferable, since there's not really any conceptual distinction between beta.obs
as a vector or as a function.
To pass the bootstrap draws directly, may I know does it mean that beta.obs be a list, where the first element is the point estimate, and the second element is perhaps a data frame of the bootstrap betas?
I could see many ways to do this:
You should decide which is the easiest. My guess is that the single data.frame is the easiest. Although a matrix is probably sufficient, not sure why we would need a data.frame in this case.
I see, sure, I would make beta.obs
into an argument that can be either a function and/or matrix, depending the procedure used.
I agree that a matrix is sufficient for beta.obs
. I think the 4th method by writing the first row as a point estimate and the remaining B rows as the bootstraps will be the easiest to implement.
Am I correct that if the user pass a matrix to beta.obs
(more than 1 row), then the bootstrap procedure is skipped automatically if the matrix provided has B + 1 rows?
If it has fewer than B + 1 rows, then the procedure will return an error, indicating that the arguments are inconsistent.
Thanks!
Actually, I think we should more generally allow for all of these to be functions of the data:
A.obs
, A.shp
, A.tgt
, beta.obs
, beta.shp
In DKQS only beta.obs
can be stochastic, but in general we might consider procedures where other components can also be stochastic. (Actually, we could do this in subsampling --- I will make an issue about that next.)
Given that, I think we want three options for each component:
list
function).Does that make sense?
Yes, this and issue #23 makes sense.
For the vectors beta.obs
and beta.shp
, should I still allow them to be a matrix of B + 1 rows if the user would like to pass the bootstrap or re-sample versions of the betas?
Or would it be better if I make them to be consistent with A.obs
, A.shp
and A.tgt
that they would need to pass a list of vector if they want to pass the bootstrap or re-sample versions of the betas?
Thanks!
I think it's better if you make them consistent and just have lists.
However, one thing to check before we make a decision is what the memory usage is of lists vs. matrices. Could you check that? That is, if I have 50 vectors of doubles arranged in a list, is that more memory than having those 50 vectors arranged into a single matrix?
Yes, for the same amount of information, list uses more memory than matrices. For instance, in your example, a list has 10,848 byes and a matrix has 8,216 bytes if there are 50 vectors with 20 elements each, as generated by the code below.
set.seed(1)
N = 50
mat <- matrix(runif(N*20), nrow = N, byrow = TRUE)
lis <- NULL
set.seed(1)
for (i in 1:N){
lis[[i]] = runif(20)
}
object.size(mat)
object.size(lis)
That doesn't seem like too much of a difference. I think it's worth it to use the list since it will be more clear how the data should be input and is consistent for both vectors and matrices. Probably the memory usage on this dimension will not be important either.
Done! I have updated the module by organizing the information around the lpmodel
object. I have also updated the structure of the code and update some notations so that they are consistent across different files. I might further revise them when I start to write the code for the fsst
procedure tomorrow.
Currently, the files in the R folder are organized as follows:
dkqs.R
, subsample.R
, invertci.R
, estbounds.R
, fsst.R
checks.R
optim.R
lpmodel
-related functions: lpmodel.R
I am also thinking how I should present the objects inside the lpmodel
function.
Also, I should be able to call
lpmodel
and get back some nice display that shows me what the underlying LP looks like in some informative way. I'm not exactly sure what the most informative way of doing this is, but take a stab at it and we will iterate a few times.
May I know would it be a good idea to print the following information of lpmodel
for each of the objects A.obs
, A.shp
, A.tgt
, beta.obs
and beta.shp
?
matrix
or a data.frame
, print its class and its dimension.function
, print its class.list
, print its class, the number of objects in the list and the dimension of the first element (print an error message if not all objects in the list have the same dimension).lpmodel
is list
, print an error message if the number of objects in each list are different.Thank you!
That seems like good output for print
.
However, the last two bullet points seem like they are also doing error-checking, right? So the error-checking component of those probably belongs somewhere else.
If it is a list, print its class, the number of objects in the list and the dimension of the first element (print an error message if not all objects in the list have the same dimension). If the class of more than one objects in lpmodel is list, print an error message if the number of objects in each list are different.
I see, yes they are doing error checking.
I agree they should belong to the error checking part, not in print
.
I am doing these checking in the beginning of the testing procedure, but I thought it might be useful to tell the user if they have specified lpmodel
in the wrong way.
I think I should modify the current check.lpmodel
function (that is currently used inside the functions), so that user can call the function to do error checking if they want.
Regarding the part on list
, would it be a good idea to revise the output as printing the dimension of the objects in the list
and the number of objects for each dimension (if the user specify a list of functions, then the dimension is not printed)?
If the user specifies all the objects in the list
to have the same dimension, then this is the same as before. But if the user specifies a list
with objects of different dimensions, I am thinking if this would be more informative than just printing the first element of the list.
Thanks!
I think this is related to the idea of a constructor for a class.
You should take a look at how others do object-oriented programming in R and see if there are any good ideas.
Ideally, one would not be able to even specify an lpmodel
object that is incoherent.
If we can assume that all lpmodel
objects are coherent, then the second issue won't arise as we will know that any lpmodel
passed to print
(or any other function) will have objects of the same size in its list
.
I have just updated the print
on lpmodel
. A sample output looks like this:
Object Class Dimension Length
A.obs matrix 11x22 1
A.shp -empty- -empty- -empty-
A.tgt matrix 1x22 1
beta.obs function N/A N/A
beta.shp -empty- -empty- -empty-
May I know would the above output be okay?
In addition, I am thinking would it be a good idea to give an optional data
argument for print
so that users can also check what would be the dimension for the output of the functions (in the example above, it would be the beta.obs
function)?
Thanks!
That output looks ok and what you propose sounds like a good idea.
Updated!
With the following code:
library(lpinfer)
J = 11
yp <- seq(0, 1, 1/10)
A.obs <- cbind(matrix(rep(0,J*J), nrow = J), diag(1, J))
A.tgt <- matrix(c(yp, yp), nrow = 1)
func.full.info <- function(data){
beta <- NULL
y_list <- sort(unique(data[,"Y"]))
n <- dim(data)[1]
yn <- length(y_list)
for (i in 1:yn){
beta_i <- sum((data[,"Y"] == y_list[i]) * (data[,"D"] == 1))/n
beta <- c(beta,c(beta_i))
}
beta <- as.matrix(beta)
return(beta)
}
lpmodel <- list(A.obs = A.obs,
A.tgt = A.tgt,
beta.obs = func.full.info)
class(lpmodel) <- "lpmodel"
The command print(lpmodel)
is giving
Object Class Dimension Length
A.obs matrix 11x22 1
A.shp -empty- -empty- -empty-
A.tgt matrix 1x22 1
beta.obs function N/A N/A
beta.shp -empty- -empty- -empty-
and the command print(lpmodel, data)
is giving
Object Class Dimension Length
A.obs matrix 11x22 1
A.shp -empty- -empty- -empty-
A.tgt matrix 1x22 1
beta.obs function 11x1 1
beta.shp -empty- -empty- -empty-
Thank you!
Looks good!
Is there a way to generate an lpmodel
object in one step?
So instead of creating a list, then assigning it a class (two steps)?
In other languages I would think something like this would work:
lpm <- lpmodel(A.obs = A.obs,
A.tgt = A.tgt,
beta.obs = func.full.info)
Done! Sure, I added a function called lpmodel
that takes five arguments: A.obs
, A.shp
, A.tgt
, beta.obs
and beta.tgt
(their default are NULL
) and set lpmodel
as a list with these five objects with class lpmodel
. Thanks!
Ok great. Is that the ways these things are usually done? Can you check? For example, what about data.frame
?
If I run data.frame(a = 1:5)
does R construct a list in the background and set the class type?
Yes, I think this is what they do for data.frame
.
From my understanding, data.frame
takes the input (...
in the function argument) and then assign it as a list via the command x <- list(...)
. Then, it assigns the class as "data.frame"
via the command class(value) <- "data.frame"
, where value
is the object returned in the data.frame
function.
I also looked at a few other functions like matrix
, date
to see if they are doing something similar but it seems like these functions are calling C codes, so they did the things differently.
Ok very good, let's close this for now at least
Probably we want to use an S3 object for this, but I am not that familiar with the pro's and con's of R's object system. I think most things are S3 objects, so that is why I suspect this is the way to go.
An
lpmodel
is an object that can have the following properties:A.obs
matrixA.shp
matrixA.tgt
matrixbeta.shp
vectorbeta.tgt
vectorbeta.obs
vectorfunc.obs
that takes can take adataframe
as an inputbeta.obs
vectorbeta.obs
--> Note that we are getting rid of inequalities as per issue #20 .
Depending on the routine being run, it might be ok for some or many of these fields to be empty, which is fine.
For example,
estbounds
can take eitherbeta.obs
orfunc.obs
. Iffunc.obs
, then the user needs to passdata
too. Also,estbounds
does not needbeta.tgt
.As another example,
dkqs
doesn't needA.shp
orbeta.shp
, as the only deterministic constraint that the DKQS procedure allows for (other than the bound constraint) is the simplex constraint. (However, if they aren't empty, then make sure to check that the user actually passed the simplex constraint --- otherwise it is a sign they are confused.)The envisioned workflow will then be something like:
lpmodel
estbounds
and/or one of the testing or confidence region procedures.