Strange error with 2 norm in subsample

a-torgovitsky commented 4 years ago

Seems to only happen with certain draws. Error message looks like it is coming from Gurobi.

set.seed(1230854902)
r <- subsample(data, lpm, .3, solver = "gurobi") # ok

set.seed(12345)
r <- subsample(data, lpm, .3, solver = "gurobi") # not ok

r <- subsample(data, lpm, .3, solver = "gurobi", norm = 1) # ok

The second one gives this error

Error in { : 
  task 26 failed - "Error 10020: Objective Q not PSD (diagonal adjustment of 2.3e+49 would be required)
"

a-torgovitsky commented 4 years ago

Here's the data (same as in earlier issue) debug.RData.zip

conroylau commented 4 years ago

Sorry for being slow in getting back on this issue. I find that the error of having a non-PSD Q matrix is caused by the matrix that corresponds to the quadratic term of the objective function in one of the bootstrap replications of the subsampling procedure that uses the L2-norm.

In that particular bootstrap replication that is causing the error, the variance matrix obtained from the subsampled data is

             factor(bin)1 factor(bin)2 factor(bin)3 factor(bin)4
factor(bin)1  3.88007e-33   0.00000000   0.00000000  0.000000000
factor(bin)2  0.00000e+00   0.01393614   0.00000000  0.000000000
factor(bin)3  0.00000e+00   0.00000000   0.01285563  0.000000000
factor(bin)4  0.00000e+00   0.00000000   0.00000000  0.009637555

In the above, the (1,1)-entry is extremely small.

The quadratic program of the subsampling procedure uses the inverse of the variance matrix (denoted as $\widehat{\Omega}_n$ below), so the quadratic part of the objective function will be very large.

In this particular example, the inverse of the variance matrix is:

[1,] 2.577273e+32  0.00000  0.0000   0.0000
[2,] 0.000000e+00 71.75587  0.0000   0.0000
[3,] 0.000000e+00  0.00000 77.7869   0.0000
[4,] 0.000000e+00  0.00000  0.0000 103.7608

The A.obs matrix is the same for all bootstrap replications (because it is deterministic)

[1,]  0.5 0.6224593 0.2689414 0.3775407
[2,]  0.5 0.6224593 0.3775407 0.5000000
[3,]  0.5 0.6224593 0.6224593 0.7310586
[4,]  0.5 0.6224593 0.7310586 0.8175745

Therefore, this gives the Q matrix (that equals $\sqrt{n} A' \widehat{\Omega}_n' \widehat{\Omega}_n A$) that is very large

             1            2            3            4
1 1.307545e+65 1.627787e+65 7.033061e+64 9.873029e+64
2 1.627787e+65 2.026463e+65 8.755589e+64 1.229112e+65
3 7.033061e+64 8.755589e+64 3.782963e+64 5.310533e+64
4 9.873029e+64 1.229112e+65 5.310533e+64 7.454940e+64

The corresponding determinant is also negative and equals -2.60175e+212, so the matrix is not PSD.

It seems that Gurobi have some tolerance on non-PSD matrices when I check the other bootstrap replications. For instance, when the determinant is -0.01573142 in another bootstrap replication, it can still run the program properly without returning any error. However, in this bootstrap replication that is returning an error, the determinant is too negative.

Therefore, I think that this error is due to the variance matrix that uses this particular subsample of the data and gives some extremely small values. Do you think a proper way to deal with the situation is to use the tryCatch function on the optimization problem and skip the bootstrap replications that would return an error (as discussed in issue #54)? Thanks!

a-torgovitsky commented 4 years ago

Yes, I think this must be the same issue whereby for some strange bootstrap draws the estimator is undefined. The same solution should be ok.

More generally, I think this is going to be a recurring problem across different procedures. Basically, some bootstrap draws are going to be difficult for various reasons, either because the estimator doesn't exist, as here, or because of some random peculiarity that makes the problem hard to solve.

We don't want to give up on these cases too many times, but sometimes (like here) it seems inevitable.

As a debugging tool, it might be good to create a standardized way to record bootstrap draws that give problems in a data frame (along with some diagnostics about the problem) and find a way to give that back to the user if requested. What do you think?

conroylau commented 4 years ago

I see.

Yes, I agree that it is a good idea to create a standardized way to record the bootstrap draws that are problematic. For now, I can think of returning the following information whenever an error occurs:

The number of the bootstrap draw
The warning message
The list of arguments used in the linear/quadratic program such as the matrices used in the objective function
Determinant of the quadratic term in the objective function if we are solving a quadratic program (I am not sure if this information is useful every time but it seems that having a non-PSD matrix for the quadratic term could be problematic in these optimization problems)

Do you think these information are enough for now? I can definitely add more information to be returned once we find some other errors from the optimization programs.

Thanks!

a-torgovitsky commented 4 years ago

Yes I think that's about right.

I would think of this as a dataframe, one row per "problem." The columns could be different for different routines, but for this one---since we know what the problem is and there's really no way to solve it---I think you can just focus on warning message (return code from the solver) and determinant of the quadratic term.

This would also be a way to keep track of how many bad solves there are. If there are too many, we want to at least tell the user something so that they are aware there may be a problem.

conroylau commented 4 years ago

Sure, no problem. I will implement this and think of a way to do it nicely together with issue #54!

conroylau commented 4 years ago

I just have updated the subsample code:

Used a tryCatch function in the bootstrap replications. This should be able to capture the errors returned from the optimizer and other possible errors.
As discussed in issue #54, I have added a new parameter Rmulti (I am not sure if it is a good parameter name though) so that ceil(R * Rmulti) is the maximum number of bootstrap replications in case there are some problematic bootstrap replications so we need to carry out more replications. The loop will stop once the number of successful bootstrap replications equals to R or there are already ceil(R * Rmulti) replications conducted.
Added the following information in the return list:
- The total number of successful bootstrap replications.
- The number of failed bootstrap replication.
- The table showing the error messages. For this table, I save the entire error message instead of the error code because I cannot find a variable that stores the error code returned from Gurobi in R when they issue an error message like the one in this case (unlike Python where it seems to be possible to get the error code from errno in the Python exception message). I tried extracting the code manually. But I am not sure if this could potentially lead to some misleading output if there are some other error messages in another format so that I might extract the wrong code.

Below is the sample output for the one that did not work before (the one marked with not ok):

> r <- subsample(data, lpm, .3, solver = "gurobi", cores = 8)                                  
> r
p-value: 0.03

I also add the number of successful and failed bootstrap replications to the summary message. The line for the failed bootstrap replications will not be printed if it is 0.

p-value: 0.03
Test statistic: 45935.84859
Solver used: gurobi
Norm used: 2
Phi used: 0.66667
Size of each subsample: 62
Number of cores used: 8
Number of successful bootstrap replications: 100
Number of failed bootstrap replications: 1

The error table in this case is

> r$df.error
  Iteration                                                                         Error message
1        26 Error 10020: Objective Q not PSD (diagonal adjustment of 2.3e+49 would be required)\n

Thanks!

conroylau / lpinfer

Strange error with 2 norm in subsample #50