conroylau / lpinfer

lpinfer: An R Package for Inference in Linear Programs
GNU General Public License v3.0
3 stars 5 forks source link

Not sure why this code doesn't run #30

Closed a-torgovitsky closed 4 years ago

a-torgovitsky commented 4 years ago

Code attached

source("run.R")

results in

Error in if (nrow(A.obs.return$sample) != nrow(beta.obs.return$sample)) { : 
  argument is of length zero
In addition: Warning message:
In if (beta.obs.cat != 0) { :
  the condition has length > 1 and only the first element will be used

example.zip

conroylau commented 4 years ago

I have fixed a few parts of the dkqs code -

  1. In an earlier design of the module, I have written the code in a way that if beta.obs is a function, then it can only return one object. Thus, there is an error in this example because the function b returns both the estimator of beta.obs and the estimator of the asymptotic variance. I have rewritten the code in the way that it accepts two objects returned from the function and ignores the asymptotic variance.
  2. I have fixed a bug in counting the number of columns of a vector.
  3. I have also updated the dkqs code in which the process of constructing the constraints (c.f. problems (5) and (6) of your notes) is skipped when the corresponding index set is empty (in this example, the set I_0 is empty).
  4. In the parallelization part, I have moved the part that calculates the bootstrap estimators outside of the foreach loop. The reason for doing this is that if I kept this part inside the foreach loop, it would result in an error because it cannot find the betaobs function that is inside the b function from the example file. I have conducted simulations and find that although this part is moved outside of the foreach loop, using the parallel option is still faster than the non-parallel option.

However, I find that there is an issue related to coercing data to matrix (as per issue #28). In this example, if I coerce data to matrix, it would result in an error because the function betaobs (in the mixedlogit.R file) needs the data to be a data.frame. Therefore, I am thinking would it be better if we do not coerce data to be in any format?

Thanks!

a-torgovitsky commented 4 years ago

Ok, good point! Yes, I think it's better if we do not coerce data to be in any format then.

Why was this needed to begin with? Just to check that the input was correct? I think we should have a list of where the places are that data gets used in the code, and what will happen if the user passes something wrong for data.

conroylau commented 4 years ago

Yes, the main purpose for checking the data object is to check if it is in the correct format.

I just prepared a list of where data is directly used in the module. I only include places where they are directly mentioned in the code (such as nrow(data)). The spreadsheet can be found here. In the last column of the spreadsheet, it shows whether data is being passed to a function that is defined in the lpinfer module (indicated by Y) or not (indicated by N).

Among the 93 instances where data appears in the module, 34 of them are calling R functions that are not defined in lpinfer. These functions are nrow, colnames, sample, is.null and the user-defined function (for objects in lpmodel).

If I pass something like a function to nrow, colnames or is.null, they would not return a warning message. The function sample is used together with nrow where I am trying to redraw the data (with or without replacement). So, sample would return an error if nrow(data) is something other than a positive integer. On the other hand, colnames is used in the dkqs procedure to identify the Y column. So, it looks like that if the user is not passing a list of bootstrap estimators and data is in the wrong format, then the code that is used to draw the bootstrap samples and estimators must be giving an error, although some other parts of the code might break down as well. On the other hand, if data is in the wrong format for the user-defined function, then it would return an error as well.

a-torgovitsky commented 4 years ago

Ok then, how about this?

This should allow data to be a data.frame, data.table, tibble, or any other generalization of a data.frame. It would not allow data to be a matrix (via the second point), but I think that is consistent with many other R functions.

I think this will allow my code to work because inside my betaobs all of the tidyverse functions will still work if data is a data.frame instead of a tibble.

conroylau commented 4 years ago

I see, sure!

Regarding the second point to require the user's function for objects in lpmodel to accept a data.frame, am I correct that this is something to be specified in the README and the documentation? Should I also add a step in passing a data.frame to the function to check if it would return an error in the beginning of the code? Thanks!

a-torgovitsky commented 4 years ago

Yes this should be indicated in the README

I think that's a good idea, except we don't want to have unnecessary calls to betaobs or other functions, as these might be time-consuming. Perhaps using exception handling the first time they are called is the right approach.

conroylau commented 4 years ago

Done! I have now coerced data to data.frame. In addition, I have added a function to check whether there will be any error or warning messages when data is being passed to the functions defined in lpmodel. Thanks!