Closed a-torgovitsky closed 4 years ago
I have fixed a few parts of the dkqs
code -
beta.obs
is a function, then it can only return one object. Thus, there is an error in this example because the function b
returns both the estimator of beta.obs
and the estimator of the asymptotic variance. I have rewritten the code in the way that it accepts two objects returned from the function and ignores the asymptotic variance.dkqs
code in which the process of constructing the constraints (c.f. problems (5) and (6) of your notes) is skipped when the corresponding index set is empty (in this example, the set I_0 is empty).foreach
loop. The reason for doing this is that if I kept this part inside the foreach
loop, it would result in an error because it cannot find the betaobs
function that is inside the b
function from the example file. I have conducted simulations and find that although this part is moved outside of the foreach
loop, using the parallel option is still faster than the non-parallel option.However, I find that there is an issue related to coercing data
to matrix
(as per issue #28). In this example, if I coerce data
to matrix
, it would result in an error because the function betaobs
(in the mixedlogit.R
file) needs the data
to be a data.frame
. Therefore, I am thinking would it be better if we do not coerce data
to be in any format?
Thanks!
Ok, good point!
Yes, I think it's better if we do not coerce data
to be in any format then.
Why was this needed to begin with? Just to check that the input was correct?
I think we should have a list of where the places are that data
gets used in the code, and what will happen if the user passes something wrong for data
.
Yes, the main purpose for checking the data
object is to check if it is in the correct format.
I just prepared a list of where data
is directly used in the module. I only include places where they are directly mentioned in the code (such as nrow(data)
). The spreadsheet can be found here. In the last column of the spreadsheet, it shows whether data
is being passed to a function that is defined in the lpinfer
module (indicated by Y
) or not (indicated by N
).
Among the 93 instances where data
appears in the module, 34 of them are calling R
functions that are not defined in lpinfer
. These functions are nrow
, colnames
, sample
, is.null
and the user-defined function (for objects in lpmodel
).
If I pass something like a function to nrow
, colnames
or is.null
, they would not return a warning message. The function sample
is used together with nrow
where I am trying to redraw the data
(with or without replacement). So, sample
would return an error if nrow(data)
is something other than a positive integer. On the other hand, colnames
is used in the dkqs
procedure to identify the Y
column. So, it looks like that if the user is not passing a list of bootstrap estimators and data
is in the wrong format, then the code that is used to draw the bootstrap samples and estimators must be giving an error, although some other parts of the code might break down as well. On the other hand, if data
is in the wrong format for the user-defined function, then it would return an error as well.
Ok then, how about this?
data
to a data.frame
betaobs
(and other functions) to accept a data.frame
.This should allow data
to be a data.frame
, data.table
, tibble
, or any other generalization of a data.frame
.
It would not allow data
to be a matrix (via the second point), but I think that is consistent with many other R functions.
I think this will allow my code to work because inside my betaobs
all of the tidyverse
functions will still work if data
is a data.frame
instead of a tibble
.
I see, sure!
Regarding the second point to require the user's function for objects in lpmodel
to accept a data.frame
, am I correct that this is something to be specified in the README
and the documentation? Should I also add a step in passing a data.frame
to the function to check if it would return an error in the beginning of the code? Thanks!
Yes this should be indicated in the README
I think that's a good idea, except we don't want to have unnecessary calls to betaobs
or other functions, as these might be time-consuming.
Perhaps using exception handling the first time they are called is the right approach.
Done! I have now coerced data
to data.frame
. In addition, I have added a function to check whether there will be any error or warning messages when data
is being passed to the functions defined in lpmodel
. Thanks!
Code attached
results in
example.zip