New test method: fsst - Githubissues

a-torgovitsky commented 4 years ago

This is the long-promised new method that I have been working on w/ coauthors.

This note describes how to implement it. fsst-module.pdf Let me know if you have any questions!

And I have included 6 example DGPs that you can try to replicate. fsst-examples.zip (And let me know if I forgot anything in these that might be helpful.)

conroylau commented 4 years ago

Thank you for sending me the FSST method!

I have just read through this new procedure and I have a few questions related to page 6 of the document.

In problem (5), the term q/||q|| in the objective function (where I write q = \hat{\beta}_n - \hat{\beta}_n^\ast here for simplicity) looks like the normalized version of q. Thus, may I know that is it the 2-norm?
As you mentioned that there are some scaling in this problem (I guess you are referring to the norm ||q||), could ||q|| be other norms too?
In addition, if ||q|| is the 2-norm, should I square the term, i.e. taking || . ||^2_2, as in the estbound function?

I will start coding this up once I finished updating the module with the lpmodel object. Thank you!

a-torgovitsky commented 4 years ago

Yes the 2--norm.
I think it could be other norms, but 2--norm has worked fine for me, so let's stick with that. The idea is just to make the objective vector have approximately unit magnitude to be similar to the coefficients in the constraint matrix. Linear programs can be difficult to solve reliably if the magnitudes of the objective and constraint coefficients are much different. This document by Gurobi a good description of the issues that can arise.
No square. The reason we do that in the estbound function is because otherwise the objective function would not be quadratic. In this case, the norm is just about the constant term and doesn't involve the variables of optimization.

conroylau commented 4 years ago

I have three additional questions while I am writing the code for the fsst procedure:

Do we always assume that vectors s, x and phi_p are nonnegative in all optimization problems? I saw that there is no “+” symbol for these parameters in problem (3), (4), (6) - (8) and (10) - (11). May I know are we relaxing the non-negativity constraints for these variables?
For vector b in problems (9) - (11), may I confirm that it can be positive or nonpositive because we do not restrict the sign on beta, so we are not restricting the sign for b as well?
In this testing procedure, do we assume that only beta.obs can stochastic and restrict the other objects in the lpmodel for the fsst method be non-stochastic?

Thank you!

a-torgovitsky commented 4 years ago

No, sometimes they are restricted to be non-negative and sometimes not. It should be indicated by the constraints. (Sometimes, as in (10), I wrote the constraints out after "s.t." instead of putting the + subscript on R.) I double checked all of those expressions and don't see any typos in which ones are required to be non-negative. However, I did find an error in my code for (3) -- my code had a non-negativity constraint, but there should not be one. This only affects DGPs 1 and 2. Here are the new numbers attached. fsst-examples-2.zip
Yes that's correct, b is not restricted here.
Yes, for this method the only one that can be stochastic is beta.obs.

conroylau commented 4 years ago

I just finished coding the fsst procedure and started testing the code using the DGPs. I got a question regarding the interpretation for the files Ashp.out and betashp.out in the zip folder. For instance, in DGP1, am I correct that A.shp is a 1x4 matrix (which equals to [1, 1, 1, 1]) and beta.shp is also a 1x4 matrix (which equals to [1, 0, 1, 0])? Then, it seems like the dimension does not match because we need to have A.shp %*% x = beta.shp where x needs to be a vector with 4 elements to be consistent with A.obs and A.tgt. So, it seems like beta.shp has to be a scalar instead of a 1x4 scalar. May I know did I interpret the file in an incorrect way? It also appears that the betashp.out file is the same as the Atgt.out file too. Thank you!

a-torgovitsky commented 4 years ago

Sorry, that is as a result of a bug in my code. I wrote out A_tgt as betashp.out, as you guessed. For all of these examples, beta.shp should just be a 1x1 vector with element 1.

conroylau commented 4 years ago

Sorry that I got some more questions while I am trying to replicate the DGPs. I am getting p-values that are close to the ones that you have, but not exactly the same. Therefore, I am trying to see where are the problematic parts of my code.

The followings are my questions:

May I know is weight.matrix set as diag in the DGPs?
In the DGP zip files, is betahatobs_vc.out the variance matrix for betaobs? Should I treat this as something given, or something that I should be able to replicate (I am unable to replicate this though)?
My last question is related to the two Omega matrices -- and on page 5 of the document and in the replication zip files. I am not sure if my interpretation about the two matrices is correct.
- From the definition, I think the last two rows and columns of the two Sigma matrices (on page 4) should be zeros because they correspond to the covariance terms of beta.shp or beta.tgt with the other elements of the vector beta but beta.shp and beta.tgt are both nonstochastic.
- Since the two Sigma matrices are regularized by the same matrix (identity matrix multiplied by ), the last two diagonal entries for the square of the Omega matrices are the same and are non-zero because is not a zero matrix. However, I find that they are not the same in omegahati.out and omegahate.out for the DGPs, so I am wondering if I have the correct interpretation. In addition, the matrices omegahate.out for DGPs 3 to 6 are zero matrices. I am thinking if omegahate.out is referring to the matrix in the document, should it be a nonzero matrix in DGPs 3 to 6 because is nonzero?
- In addition, may I know should I be replicating the two Omega matrices, or should I treat them as input in my code?

Thank you!

a-torgovitsky commented 4 years ago

Sorry that I got some more questions while I am trying to replicate the DGPs. I am getting p-values that are close to the ones that you have, but not exactly the same. Therefore, I am trying to see where are the problematic parts of my code.

The followings are my questions:

May I know is weight.matrix set as diag in the DGPs?

I used the second option in the note (the inverse of the entire variance-covariance matrix).

In the DGP zip files, is betahatobs_vc.out the variance matrix for betaobs? Should I treat this as something given, or something that I should be able to replicate (I am unable to replicate this though)?

Yes it is, and yes you can take it as given for current purposes. In practice, it is something the user will have to pass enough information to be able to replicate: Either they tell you what it is directly, or they give you the function betaobs and you bootstrap an estimate of it from that.

My last question is related to the two Omega matrices -- and on page 5 of the document and in the replication zip files. I am not sure if my interpretation about the two matrices is correct.

From the definition, I think the last two rows and columns of the two Sigma matrices (on page 4) should be zeros because they correspond to the covariance terms of beta.shp or beta.tgt with the other elements of the vector beta but beta.shp and beta.tgt are both nonstochastic.

Yes, that's correct.

Since the two Sigma matrices are regularized by the same matrix (identity matrix multiplied by ), the last two diagonal entries for the square of the Omega matrices are the same and are non-zero because is not a zero matrix. However, I find that they are not the same in omegahati.out and omegahate.out for the DGPs, so I am wondering if I have the correct interpretation.

Sorry -- I used a different $\bar{\rho}$ for the "i" and "e" matrices, and neglected to mention that in the document. The \bar{\rho} I listed in the note is for the "i" matrix. The \bar{\rho} for the "e" matrix should be the same thing but instead with the sup--norm of $\hat{\Omega}_{n}^{\beta - \beta^{\star}}$.

In addition, the matrices omegahate.out for DGPs 3 to 6 are zero matrices. I am thinking if omegahate.out is referring to the matrix in the document, should it be a nonzero matrix in DGPs 3 to 6 because is nonzero?

These DGPs are all of the $d \geq p$ case. In this case, the range component is zero, so $\hat{\Omega}{n, \bar{\rho}}^{e}$ is irrelevant. Also, in this case $\hat{\Sigma}{n}^{\beta - \beta^{\star}} = 0$. Since $\hat{\Omega}_{n, \bar{\rho}}^{e}$ is irrelevant here, I just set it to zero without regularizing.

In addition, may I know should I be replicating the two Omega matrices, or should I treat them as input in my code?

You should replicate them -- they need to be constructed from the user's input.

Thank you!

conroylau commented 4 years ago

Thanks! After doing more checks, I would like to follow-up on two more items regarding the two Omega matrices.

I tried to use the two Omega matrices in the zip file to reproduce the parameter that is used in the DGPs. For the “i" matrix, may I check with you that is the Frobenius norm of used in calculating the regularization parameter instead of the norm mentioned in the notes? It also seems that the DGPs used the Frobenius norm of the matrix instead of the sup-norm in the “e” matrix.
In DGPs 1 and 2, the test statistics and the corresponding bootstrap test statistics that I obtain from my code are the same as the ones in the zip files except that all are smaller by a factor of 10. I figure out that this discrepancy disappears when I replace the two Omega matrices in my code with the Omega matrices in the zip files. After using the Frobenius norm to regularize the Sigma matrices in my code (using the norm function with type = “f”), I find that my Omega matrices are smaller than the Omega matrices in the zip files by a factor of 100, which seem to coincide with the number of bootstraps. Thus, can I check with you should I multiply my answer by R (or equivalently, multiply it by B, based on the notation in the document)? [I am focusing DGPs 1 and 2 here only because in DGPs 3 to 6, uses the variance matrix provided from the user (and adding the zero columns and rows), and is zero. Thus, there is no order of magnitude problems in my results for DGPs 3 – 6 because I am using the variance matrix obtained from betahatobs_vc.out.]

The replication code for the above two observations can be found here. The code produces two outputs. The first output is the difference between the norm that I tried to reproduce from the DGPs (i.e. CodeCogsEqn (8) multiplied by CodeCogsEqn (9) and the Frobenius norm obtained from the norm function. The second output is the elementwise division for the Sigma matrix obtained from the DGP and the Sigma matrix obtained from using the formula in the document. To run the code, I think the only thing that has to be changed is to update the working directory as the folder that contains the .out files for the DGP.

Thank you!

a-torgovitsky commented 4 years ago

Yes that's right, the norm being used in my code is the standard Euclidean norm treating the matrix as a vector. This is a bug in my code (in the sense that I meant to put what I had in the notes), however it seems to work fine in my simulations, so let's keep it as that.
That's a bug in my code -- I had an additional scaling term that popped up due to a bug in my matrix multiplication. I fixed it, re-verified with your code, and now our Omega matrices match.

Here are new examples attached after I fixed the bug in 2. (They have changed since the last time, but that should be fine, right? Let me know if not.)

fsst-examples3.zip

conroylau commented 4 years ago

Thanks, now we are having test statistics that are the same in the first 7 decimal places for DGP 1 and the first 10 decimal places for DGPs 3 and 4.

For DGP2, I find that the reason for our test statistics that does not match is because we have different beta.star and their bootstrap counterparts. For DGP1, 3 and 4, we have the same beta.star.
In calculating the p-values, may I check with you is it done by counting the average number of bootstrap test statistics that are larger than the point estimate? Using this method and the test statistics, the p-value that I have is 0.01 smaller than your p-values for DGPs 1 to 3 and is 0.01 larger than you p-value for DGP 4.

The replication code can be found here. There are two folders in the zip file. The first folder contains the R code and a short note of writing problem (3) in standard form. The second folder contains the R code to replicate the second observation on the p-value. The working directory has to be updated for the both codes. In addition, you would need to update the dgp.id in the replication file for the p-value as well.

Thank you!

a-torgovitsky commented 4 years ago

There is no reason to write (3) in standard form. We are assuming that the user has specified their problem in standard form, and that the A matrices and \beta vectors they give us reflect that. However, we do not have to do standard form problems inside the package. You should try to just implement (3) directly as it is, and see if our betastars are still different. Then it will be easier to debug.
This was a bug in my code. Once I fix it, my p-values adjust to match yours.

conroylau commented 4 years ago

The reason that I have written problem (3) in standard form is because I had experience of getting an error when I specify the lower bound as -Inf in solving some other linear programs using gurobi before. Therefore, I thought it is safer to rewrite everything in the standard form to prevent any surprising results.

Please find the updated code here for solving (3) without writing it in standard form. In this version, the beta.star that I get is the same as the last version where I wrote the code by transforming problem (3) into standard form. The difference between our beta.star is between 1.7e-5 and 9.6e-5. (For DGP 1, the difference between our beta.star is between 1.1e-13 and 4.4e-13). I am not sure if this is due to numerical inaccuracy, and I am reading through my code to see if there is anything wrong. The p-value that I have with my beta.star and the corresponding bootstrap components is 0.32, which is 0.08 smaller than using the p-value using the beta.star and the corresponding bootstrap components from the zip file.

Thanks!

a-torgovitsky commented 4 years ago

The reason that I have written problem (3) in standard form is because I had experience of getting an error when I specify the lower bound as -Inf in solving some other linear programs using gurobi before. Therefore, I thought it is safer to rewrite everything in the standard form to prevent any surprising results.

There shouldn't be any reason to do that, as Gurobi will do the translation itself. If you ever find a strange thing with Gurobi, you should try to isolate it as a MWE and send it to them.

Please find the updated code here for solving (3) without writing it in standard form. In this version, the beta.star that I get is the same as the last version where I wrote the code by transforming problem (3) into standard form. The difference between our beta.star is between 1.7e-5 and 9.6e-5. (For DGP 1, the difference between our beta.star is between 1.1e-13 and 4.4e-13). I am not sure if this is due to numerical inaccuracy, and I am reading through my code to see if there is anything wrong. The p-value that I have with my beta.star and the corresponding bootstrap components is 0.32, which is 0.08 smaller than using the p-value using the beta.star and the corresponding bootstrap components from the zip file.

Thanks!

Normally, I wouldn't be too concerned about a 1e-5 difference. But since it is having a noticeable impact on the p--value, that is concerning.

I briefly looked at both your code and mine, and checked for example the objective vectors. They look very similar, and our xstar outputs are also very similar (but not exact). My code is using Julia --> JuMP --> Gurobi, whereas yours is using R --> Gurobi, so perhaps there is something going on in the symbolic translation JuMP is doing that is affecting things.

Some things that would help debugging: 1) Are our TS's similar or different? How about the bootstrap values? 2) If you replace your betahatstar with mine (both for sample and bootstrap) throughout, how do the TS and p--values compare?

conroylau commented 4 years ago

There shouldn't be any reason to do that, as Gurobi will do the translation itself. If you ever find a strange thing with Gurobi, you should try to isolate it as a MWE and send it to them.

I see. I will do so when I encounter something strange next time. In light of this, I have changed everything in the fsst code from standard form back to their original form as in the document (i.e. without splitting the variables into its positive and negative components), and I re-ran all my analysis. The p-values and the test statistics that I get in the four DGPs are as follows:

DGP	p-value	`cone.n`	`range.n`
1	0.9	0.6135432539	0.0.2387698777
2	0.4	0.4979030430	1.9828827434
3	0.83	44.5322864124	0
4	0.66	49.1959752546	0

The updated p-values all match with the p-values in the replication file. I think the previous discrepancy in our p-value for DGP2 might be due to writing the other LP/QP in standard form and it caused a mistake somehow. I am trying to locate which part of the code is causing this and why it is causing a problem in the p-value for DGP2 only but not the other DGPs.

Although the p-values match finally in DGP2, our cone.n and range.n for DGP2 do not match. So I tried substituting the beta.star and its bootstrap components from the replication file directly into the code. The updated table becomes:

DGP	p-value	`cone.n`	`range.n`
1	0.9	0.6135432523	0.2387698914
2	0.4	0.4985108463	2.0031053046
3	0.83	44.5322864124	0
4	0.66	49.1959752545	0

The difference in the test statistics for DGPs 1, 3 and 4 are smaller than 1e-7, except for DGP 2, where the change in the test statistics is larger.

Using the beta.star and its bootstrap components from the replication file, the p-value and the test statistics match the values from the replication file now.

For DGP2, if I use the beta.star and its bootstrap components from the replication file, the difference between our bootstrap test statistics are between 0 and 0.006 for cone; and the differences are smaller than 1.01e-13 for range.

If I am computing the beta.star and its bootstrap components via R and Gurobi, the difference between our bootstrap test statistics are larger. For cone, they are between 2.29e-5 and 7.71e-3 for cone. For range, they are between 0.009 and 0.255.

a-torgovitsky commented 4 years ago

Ok good, this is progress.

So, just to clarify:

The only DGP we are not agreeing on is DGP2.
We have narrowed down the source of the discrepancy in DGP2 to the \hat{\beta}^{\star} computation.

Is that a correct summary?

conroylau commented 4 years ago

I just found another discrepancy in DGP1 for its bootstrap cone test statistics although it is not affecting the p-value. This discrepancy exists no matter I am using my beta.star or your beta.star. Among the 100 bootstrap cone test statistics, the differences between 62 of them are smaller than 1e-9. For the remaining 38 test statistics, the difference are between 0.002 and 0.24. The differences between our range bootstrap statistics are between 3.2e-10 and 1.05e-5.

I also checked the bootstrap test statistics for DGP3 and DGP4. The differences are smaller than 1e-13 and 3.94e-5 for the cone bootstrap test statistics in the two DGPs respectively. Our range components are the same because they are all zero.

For DGP1, the discrepancy in the bootstrap components for the cone test statistics reduces to below 1.65e-7 when I use the beta.r from the replication file. For DGP2, the discrepancy is between 9.42e-6 and 7.73e-3 if I use beta.r from the replication file. If I am using beta.r, beta.star and its bootstrap components from the replication file, the discrepancy between the bootstrap cone test statistics are between 1.613-e13 and 1.019e-10.

Therefore, based on this additional information, my answers to your two questions are:

We are not agreeing on DGP1 and DGP2.
- For DGP1, we are not agreeing on the bootstrap cone test statistics.
- For DGP2, we are not agreeing on the point estimate of the test statistics and the cone bootstrap statistics.
The source of discrepancy in DGP1 is beta.r. The source of discrepancy in DGP2 is beta.r and beta.star.

I am looking through the code for beta.r to see what is the source of the problem.

Thanks!

a-torgovitsky commented 4 years ago

Ok, very good work debugging. You are using the right strategy! Let me know what you find.

conroylau commented 4 years ago

After debugging, I find that the discrepancy in beta.r is due to a bug in my code, where I put beta.n instead of beta.star in the case of d < p. After fixing this, we are having the same p-values, same test statistics (where they match up to the 10 decimal places) and same bootstrap test statistics (differences are all smaller than 1e-10 for DGPs 1 to 3 and smaller than 1e-5 for DGP4) when I am using the beta.star and its bootstrap components from your replication file.

If I am calculating the beta.star and its bootstrap components in R using Gurobi instead, we are still having the same p-values, although the discrepancy in the test statistics and its bootstrap versions for DGPs 1 and 2 become larger:

For DGP1, it looks like the differences are still very small. Our cone and range test statistics are still the same up to the first 7 decimal places. The differences in our bootstrap cone components are between 0 and 1.66e-7, whereas the differences in our bootstrap range components are between 3.2e-10 and 1.05e-5.
For DGP2, my cone and range test statistics are 0.49790 and 1.98288 respectively, whereas you have 0.49851 and 2.00311. The differences in our bootstrap cone components are between 1.02e-5 and 7.37e-3, whereas the differences in our bootstrap range components are between 9.36e-4 and 0.255.
For DGPs 3 and 4, the differences did not change because the fsst method sets beta.star and its bootstrap components to be the same as beta.n and its bootstrap components

Therefore, after fixing my bug, my modified answer to your earlier question is that we are not agreeing on DGP2, and the source of error is beta.star and its bootstrap components.

Thanks!

a-torgovitsky commented 4 years ago

Ok, great work!

The discrepancy on beta.star is perplexing, but we should try to get to the bottom of it. Could you send your updated code for computing it?

All of our inputs to the program are the same, right? (The matrices, vectors, etc. in the program definition are just taken from my output.)

So that means the discrepancy must be caused by:

A difference between how we are setting up the problem. Although this seems somewhat unlikely given the similarity of DGP1.
A difference in how Gurobi is solving it. This could in turn be because
- We are using different parameters in Gurobi
- The problem is getting translated slightly differently via JuMP

It should be possible for me to pin this down fairly easily.

conroylau commented 4 years ago

Here is the latest code that is used to compute beta.star. Do you only want the code for beta.star or the full module for the fsst procedure? I am now updating the comments and the annotations of the entire fsst code after the recent changes.

I think our input are the same. For this code that computes beta.star, I am using the following .out files as the input:

Aobs.out
Ashp.out
Atgt.out
betahatobs.out
betashp.out
betatgt.out
betahatobs_vc.out

I am using the avar option to compute the weighting matrix, i.e. inverse of the matrix in betahatobs_vc.out.

Thanks for your help!

a-torgovitsky commented 4 years ago

Just the beta.star function is fine. I will look into this more closely tomorrow and let you know what I find.

conroylau commented 4 years ago

Sure, thanks a lot!

a-torgovitsky commented 4 years ago

I think both of our code is correct, but that this optimization problem is (numerically) not strictly convex, so that the solution is not unique. Some evidence for this can be seen by passing Method = 1 to Gurobi as an option. The solution (xstar) changes considerably.

I am talking with Andres Santos about this to see if this is a problem for the theory, and if so any ways to fix it. For now, let's just go with your current code. We may have to make some (small) adjustments later on.

conroylau commented 4 years ago

I see. I will go ahead to finalize the current version of the code and push it to GitHub when it's completed. Thank you!

a-torgovitsky commented 4 years ago

Ok, it looks like this is just a consequence of the quadratic program being essentially weakly convex for numerical purposes. This is not a problem in theory, since what we care about is the uniqueness of $\hat{\beta}^{\star}$, not the solution $\hat{x}^{\star}$. As we saw, we still have some sensitivity in $\hat{\beta}^{\star}$, but it's not that much, and doesn't appear to be empirically relevant (affect the p--value), at least in these examples.

So the plan is to leave this for now and revisit it if it becomes a problem in the future. The underlying problem here is that the $Aobs ' Aobs$ matrix is numerically singular (even though we know it is not actually singular), which means what we are dealing with is very similar to multicollinearity in an OLS framework. Following that logic, we could start removing redundant rows (as in OLS) and potentially stabilize the problem. This seems like much more work than appropriate for now, especially given that we are not sure this is a practical problem.

@conroylau I think our code agrees on everything now except for this issue that we are going to put aside. Please close this when done pushing the new code for the method. Thanks!

conroylau commented 4 years ago

Thank you for your detailed explanations! I see why our answers differ slightly in DGP2 now. I am wondering do you want me to include a warning message when $Aobs’ (weight.matrix) Aobs$ is numerically singular, say when the determinant is smaller than a certain value? (In DGP1, it is 4.87e-22. In DGP2, it is -1.56e-32).

I have finished coding the fsst module (the current version can be found here). But I would like to confirm with you a few items about the module fsst:

For the parameter $\bar{rho}$ used to normalize the Omega matrices, should I use the sup-norm as in the document or the Frobenius norm as discussed above?
Should I use the same $\bar{rho}$ parameter to normalize the two Omega matrices, or calculate $\bar{rho}$ for each Omega matrix?

Thank you very much!

a-torgovitsky commented 4 years ago

I don't think it is necessary to have the warning.

Let's use the Frobenius norm as above.
Calculate for each matrix.

Under 1. and 2, our results for these matrices matched, right?

conroylau commented 4 years ago

Yes, our results for the matrices matches under 1 and 2!

a-torgovitsky commented 4 years ago

Ok good, let's stick with those choices then.

conroylau commented 4 years ago

Sure, I have pushed the code for the new fsst testing procedure to GitHub. Thank you!

conroylau / lpinfer

New test method: fsst #24