conroylau / lpinfer

lpinfer: An R Package for Inference in Linear Programs
GNU General Public License v3.0
3 stars 5 forks source link

Check these debug kits #78

Closed a-torgovitsky closed 4 years ago

a-torgovitsky commented 4 years ago

It's been a while since we checked the output of the fsst procedure against my Julia code, and several things have changed, so it seems like a good time to check.

Here is the result of running the FSST procedure in my code for several different DGPs, one draw of the data each DGP. Please check against the output of the R code and let me know if there are any discrepancies. I think I included all of the output needed to do these checks, but let me know if I missed something! debugkit.zip

conroylau commented 4 years ago

Thanks for debug kits!

May I know were the files generated by setting rho as 1e-4 and choosing the weight matrix by avar (the defaults)?

In addition, would you mind sending me the range test statistics as well so I could check these numbers as well?

Thank you!

a-torgovitsky commented 4 years ago

The var-cov matrix should be diagonal in all cases, so using either diag or avar for the weight matrix should yield the same thing.

In the updated .zip I include both rho (always set at 1e-4) and the test statistics. debugkit.zip

conroylau commented 4 years ago

I have checked over the cases in the debug kit. I mainly compare our p-value, data-driven lambda, critical value at the 95% percentile, cone and range component of the test statistics, Omega matrix and the bootstrap cone and range components of the test statistics.

I fixed the calculation of the p-values where I replaced >= by >. After fixing this, I have the following findings:

  1. Our results match for the d > p cases (the discrepancies are smaller than 1e-6).
  2. For the d < p cases, our p-values, range test statistics, the bootstrap estimates of the range component and the critical value does not match.
  3. For the one with d = 16 and p = 18, many of the computations in the QP for finding the bootstrap x-star returns a status code of ITERATION_LIMIT.

For the second point, I am wondering if we used a different method in the norm. I tried extracting the directly the quantities inside the norm to compute the range test statistics, but it does not match your results:

bobs <- as.matrix(read.delim("betahatobs.out", header = FALSE))
bvar <- as.matrix(read.delim("betahatobs_vc.out", header = FALSE), dimnames = NULL)
weightmat <- solve(bvar)
bobs.star <- as.matrix(read.delim("betahatobs_star.out", header = FALSE))
samplesize <- as.numeric(read.delim("samplesize.out", header = FALSE))
base::norm(expm::sqrtm(weightmat) %*% (bobs - bobs.star) * sqrt(samplesize), type = "I")

For instance, in the example where d = 4 and p = 6, I get 0.1939348 instead of 0.1275637 in the debug kit for the range test statistic.

Regarding the third point, I am still trying to see what's the problem behind it.

a-torgovitsky commented 4 years ago
  1. I think your version is right and there is a bug in my code because I neglected to take the inverse of bvar. debugkit.zip Here's a new set of results. Do these get closer?

  2. Yes that's right, I ran into this too and spent a long time debugging it. What I found is that sometimes for this problem Gurobi will have a better shot solving it with primal or dual simplex. I also found that occasionally changing the NumericFocus or ScaleFlag options would also help. What I would suggest is doing the following: a) Try to solve normally. If it's ok, then no problem. b) If solve fails, then try looping through Gurobi's Method option 0,1,2. If one of these is ok, then move on. c) If b) still fails, then try looping through NumericFocus option 1,2,3 for each choice of Method. If one of these is ok, then move on. d) If c) still fails, then try looping through ScaleFlag option -1, 0, 1, 3 for each choice of both Method and NumericFocus. Hopefully one of these is ok!

If the solve still fails after trying all these, then just give up and try another bootstrap draw, the same way that we had discussed for trying to evaluate betaobs.

conroylau commented 4 years ago
  1. Yes, our results are much closer now!

    • For p-values and the data-driven lambda, our result matches.
    • For the other quantities, our differences are generally smaller than 1e-5.
  2. For this procedure, may I know is it enough to stop when one of the combinations work?

Thanks!

a-torgovitsky commented 4 years ago

Yes that's what I meant. As soon as there is a successful solve you can stop. It's very hard to get a false positive in these algorithms.

On Wed, Aug 12, 2020, 7:09 PM conroylau notifications@github.com wrote:

1.

Yes, our results are much closer now!

  • For p-values and the data-driven lambda, our result matches.

    • For the other quantities, our differences are generally smaller than 1e-5. 2.

    For this procedure, may I know is it enough to stop when one of the combinations work?

Thanks!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/conroylau/lpinfer/issues/78#issuecomment-673169867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEATCRF6ZG52QB445W6JTR3SAMVMNANCNFSM4P3OBL4Q .

conroylau commented 4 years ago

I see. No problem, I will code that up. Thanks!

conroylau commented 4 years ago

I have coded up the part that looping through different options in Gurobi. The loop will stop once the status is OPTIMAL.

However, our results do not match completely yet for the d = 16 and p = 18 case.

If I set lambda as NA in the fsst function, I only get 98 successful bootstrap draws instead of 100 successful draws. Two of them have a status of UNBOUNDED from Gurobi. The p-value is also 0.3469 instead of 0.38. Except for the data-driven lambda and the range test statistic, the other numbers are also having some discrepancies.

On the other hand, if I set lambda as 1 (which equals to the data-driven lambda), the p-value that I get is 0.34. Our Omega matrix, the range test statistics and its bootstrap estimates are very close. However, our critical value, cone test statistic and its bootstrap estimates are having larger discrepancies. For instance, the cone test statistic I get is 1.588 instead of 1.62906 (which is the cone test statistic from the debug kit)

If I solve the cone problem by using the numbers in the debug kit instead, I get 1.601931, which is still not the same as the one from the debug kit. The replication code that I used to solve the cone problem directly by using the numbers in the debug kit can be found here.

Thanks!

a-torgovitsky commented 4 years ago

Ok great. Since we match on all other designs, my guess is that this coming from mild discrepancies in what options we use to solve the cone problem.

Let's focus on the cone test statistic. Is your 1.588 coming from solving with the defaults? Here's the output I get when solving with the defaults:

Gurobi Optimizer version 9.0.2 build v9.0.2rc0 (linux64)
Optimize a model with 53 rows, 70 columns and 916 nonzeros
Model fingerprint: 0x76af1f66
Coefficient statistics:
  Matrix range     [5e-04, 2e+00]
  Objective range  [3e-02, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 1 rows and 1 columns
Presolve time: 0.00s
Presolved: 52 rows, 69 columns, 926 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0      handle free variables                          0s
Warning: Markowitz tolerance tightened to 0.0625
Warning: Markowitz tolerance tightened to 0.125
Warning: Markowitz tolerance tightened to 0.5
      75    2.3038380e-02   0.000000e+00   0.000000e+00      0s

Solved in 75 iterations and 0.00 seconds
Optimal objective  2.303838046e-02

Then I add the scaling by root n afterwards to get:

1.6290595048836116
conroylau commented 4 years ago

Sure, this is what I get in the cone problem:

Gurobi Optimizer version 9.0.2 build v9.0.2rc0 (mac64)
Optimize a model with 53 rows, 70 columns and 972 nonzeros
Model fingerprint: 0x3c6b39dd
Coefficient statistics:
  Matrix range     [1e-13, 2e+00]
  Objective range  [3e-02, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Presolve time: 0.00s
Presolved: 53 rows, 70 columns, 972 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0      handle free variables                          0s
Warning: Markowitz tolerance tightened to 0.0625
      50    2.2464541e-02   0.000000e+00   0.000000e+00      0s

Solved in 50 iterations and 0.00 seconds
Optimal objective  2.246454067e-02

Then I multiply it by root n to get

1.588483

It looks like that the number of nonzeros that we have are not the same based on the output messages. I am wondering if it is because of our differences in the Omega matrix.

a-torgovitsky commented 4 years ago

Could be, but then why would we match on everything else? What happens if you use my Omega matrix?

conroylau commented 4 years ago

Here is what I get when I use your Omega matrix instead of mine:

Gurobi Optimizer version 9.0.2 build v9.0.2rc0 (mac64)
Optimize a model with 53 rows, 70 columns and 916 nonzeros
Model fingerprint: 0x285e6882
Coefficient statistics:
  Matrix range     [5e-04, 2e+00]
  Objective range  [3e-02, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [1e+00, 1e+00]
Presolve removed 1 rows and 1 columns
Presolve time: 0.00s
Presolved: 52 rows, 69 columns, 926 nonzeros

Iteration    Objective       Primal Inf.    Dual Inf.      Time
       0      handle free variables                          0s
Warning: Markowitz tolerance tightened to 0.0625
      59    2.3038381e-02   0.000000e+00   0.000000e+00      0s

Solved in 59 iterations and 0.00 seconds
Optimal objective  2.303838060e-02

What we get are almost the same except for the warnings part and the number of iterations.

a-torgovitsky commented 4 years ago

Ok, good. The Omega matrix is constructed from betahatstar bootstrap draws. So presumably we don't match on some of those. Can you find 1 or 2 draws where we don't match? Then we can compare solver results.

conroylau commented 4 years ago

I find that we have several betahatstar that are having larger discrepancies, such as the 18th and the 35th bootstrap replications.

I try to solve the QP for getting the betahatstar to compare our results (problem (3) of your notes). I find that if I set the Method option as 0, our betahatstar are much closer, whereas if I leave it as the default (-1), our answers are having larger differences (some of them are larger than 0.001 componentwise).

Hence, I guess the discrepancies are coming from the Method option if I try to replicate it in RStudio. The replication code can be found here.

I checked back the Gurobi manual that in solving a QP, leaving the option as default means using the barrier method (i.e. it will automatically set Method as 2), whereas setting Method as 0 means choosing the primal simplex method. Thus, may I know should I stick with the default in solving this QP or should I set the Method option as 0? Thanks!

a-torgovitsky commented 4 years ago

Ok, there was a bug in my code whereby I was not correctly resetting Method = -1 for the betahatstar problem. So I was using Method = 0 as you guessed.

However, when looking into this I noticed that Method = 0 just seems to work a lot better for the betahatstar problem. So let's make that the default just for this portion of the procedure.

Here is the updated debugkit. Are we matching now? I suppose there may still be some discrepancy in the betahatstar solutions if we iterate through Gurobi options in a different order. No point in tracking those down I think, since the discrepancies are so small now. debug-kit.zip

conroylau commented 4 years ago

Thanks for sending the updated debug kits! I have checked over all the cases again by setting Method = 0 in the beta.star problem. Except for the d = 18, p = 16 case, our other results match closely as before.

For the d = 18, p = 16 case, our beta.star and its bootstrap estimates match closely now (the average discrepancy is now less than 1e-9).

The only parts that do not match are the p-value (I get 0.24 instead of 0.27 in the debug kit), the critical value and the cone components. I guess the discrepancies are coming from the differences in the cone component of our sample and bootstrap test statistics. The sample cone test statistic that I have is 1.445233 instead of 1.475782. I suspect that this is because we might have used different options in Gurobi when we solve the cone problems.

To do this, I try to replicate the cone components using the other objects in the debug kit to find out what options were used. I find that setting Method = 3 and ScaleFlag = 2 gives the closest match of the sample cone component. For instance in the bootstrap estimates, setting Method = 2, NumericFocus = 1 and ScaleFlag = 2 gives the closest match for the 3rd bootstrap estimate of the cone component; setting Method = 0, NumericFocus = 3 and ScaleFlag = 2 gives the closest match for the 5th bootstrap estimate of the cone component.

On the other hand, in your earlier comment, you mentioned that I should try to loop through -1, 0, 1, 3 in ScaleFlag. May I know that should ScaleFlag = 2 not be used? The reason for asking is that it seems by setting ScaleFlag = 2 would match the results most closely if I am trying to replicate the cone components using the other objects in the debug kit.

Thanks!

a-torgovitsky commented 4 years ago

Oops that was a typo. I meant ScaleFlag -1, 0, 1, 2, and 3.

These are small discrepancies and as you say, they are probably caused by different sequences of changing options when we run into problems with Cone. So I think we can call this closed now.

conroylau commented 4 years ago

I see. Sure! I have just updated the Gurobi part where I added the option 2 for ScaleFlag. Thanks!