Closed a-torgovitsky closed 4 years ago
Thank you for sending me the FSST method!
I have just read through this new procedure and I have a few questions related to page 6 of the document.
estbound
function?I will start coding this up once I finished updating the module with the lpmodel
object. Thank you!
estbound
function is because otherwise the objective function would not be quadratic. In this case, the norm is just about the constant term and doesn't involve the variables of optimization.I have three additional questions while I am writing the code for the fsst
procedure:
beta.obs
can stochastic and restrict the other objects in the lpmodel
for the fsst
method be non-stochastic?Thank you!
No, sometimes they are restricted to be non-negative and sometimes not. It should be indicated by the constraints. (Sometimes, as in (10), I wrote the constraints out after "s.t." instead of putting the + subscript on R.) I double checked all of those expressions and don't see any typos in which ones are required to be non-negative. However, I did find an error in my code for (3) -- my code had a non-negativity constraint, but there should not be one. This only affects DGPs 1 and 2. Here are the new numbers attached. fsst-examples-2.zip
Yes that's correct, b is not restricted here.
Yes, for this method the only one that can be stochastic is beta.obs
.
I just finished coding the fsst
procedure and started testing the code using the DGPs.
I got a question regarding the interpretation for the files Ashp.out
and betashp.out
in the zip folder.
For instance, in DGP1, am I correct that A.shp
is a 1x4 matrix (which equals to [1, 1, 1, 1]) and beta.shp
is also a 1x4 matrix (which equals to [1, 0, 1, 0])? Then, it seems like the dimension does not match because we need to have A.shp %*% x = beta.shp
where x
needs to be a vector with 4 elements to be consistent with A.obs
and A.tgt
. So, it seems like beta.shp
has to be a scalar instead of a 1x4 scalar. May I know did I interpret the file in an incorrect way? It also appears that the betashp.out
file is the same as the Atgt.out
file too.
Thank you!
Sorry, that is as a result of a bug in my code. I wrote out A_tgt
as betashp.out
, as you guessed. For all of these examples, beta.shp
should just be a 1x1 vector with element 1.
Sorry that I got some more questions while I am trying to replicate the DGPs. I am getting p-values that are close to the ones that you have, but not exactly the same. Therefore, I am trying to see where are the problematic parts of my code.
The followings are my questions:
May I know is weight.matrix
set as diag
in the DGPs?
In the DGP zip files, is betahatobs_vc.out
the variance matrix for betaobs
? Should I treat this as something given, or something that I should be able to replicate (I am unable to replicate this though)?
My last question is related to the two Omega matrices -- and on page 5 of the document and in the replication zip files. I am not sure if my interpretation about the two matrices is correct.
beta.shp
or beta.tgt
with the other elements of the vector beta
but beta.shp
and beta.tgt
are both nonstochastic. omegahati.out
and omegahate.out
for the DGPs, so I am wondering if I have the correct interpretation. In addition, the matrices omegahate.out
for DGPs 3 to 6 are zero matrices. I am thinking if omegahate.out
is referring to the matrix in the document, should it be a nonzero matrix in DGPs 3 to 6 because is nonzero?Thank you!
Sorry that I got some more questions while I am trying to replicate the DGPs. I am getting p-values that are close to the ones that you have, but not exactly the same. Therefore, I am trying to see where are the problematic parts of my code.
The followings are my questions:
- May I know is
weight.matrix
set asdiag
in the DGPs?
I used the second option in the note (the inverse of the entire variance-covariance matrix).
- In the DGP zip files, is
betahatobs_vc.out
the variance matrix forbetaobs
? Should I treat this as something given, or something that I should be able to replicate (I am unable to replicate this though)?
Yes it is, and yes you can take it as given for current purposes. In practice, it is something the user will have to pass enough information to be able to replicate: Either they tell you what it is directly, or they give you the function betaobs
and you bootstrap an estimate of it from that.
My last question is related to the two Omega matrices -- and on page 5 of the document and in the replication zip files. I am not sure if my interpretation about the two matrices is correct.
- From the definition, I think the last two rows and columns of the two Sigma matrices (on page 4) should be zeros because they correspond to the covariance terms of
beta.shp
orbeta.tgt
with the other elements of the vectorbeta
butbeta.shp
andbeta.tgt
are both nonstochastic.
Yes, that's correct.
- Since the two Sigma matrices are regularized by the same matrix (identity matrix multiplied by ), the last two diagonal entries for the square of the Omega matrices are the same and are non-zero because is not a zero matrix. However, I find that they are not the same in
omegahati.out
andomegahate.out
for the DGPs, so I am wondering if I have the correct interpretation.
Sorry -- I used a different $\bar{\rho}$ for the "i" and "e" matrices, and neglected to mention that in the document. The \bar{\rho} I listed in the note is for the "i" matrix. The \bar{\rho} for the "e" matrix should be the same thing but instead with the sup--norm of $\hat{\Omega}_{n}^{\beta - \beta^{\star}}$.
In addition, the matrices
omegahate.out
for DGPs 3 to 6 are zero matrices. I am thinking ifomegahate.out
is referring to the matrix in the document, should it be a nonzero matrix in DGPs 3 to 6 because is nonzero?
These DGPs are all of the $d \geq p$ case. In this case, the range component is zero, so $\hat{\Omega}{n, \bar{\rho}}^{e}$ is irrelevant. Also, in this case $\hat{\Sigma}{n}^{\beta - \beta^{\star}} = 0$. Since $\hat{\Omega}_{n, \bar{\rho}}^{e}$ is irrelevant here, I just set it to zero without regularizing.
- In addition, may I know should I be replicating the two Omega matrices, or should I treat them as input in my code?
You should replicate them -- they need to be constructed from the user's input.
Thank you!
Thanks! After doing more checks, I would like to follow-up on two more items regarding the two Omega matrices.
I tried to use the two Omega matrices in the zip file to reproduce the parameter that is used in the DGPs. For the “i" matrix, may I check with you that is the Frobenius norm of used in calculating the regularization parameter instead of the norm mentioned in the notes? It also seems that the DGPs used the Frobenius norm of the matrix instead of the sup-norm in the “e” matrix.
In DGPs 1 and 2, the test statistics and the corresponding bootstrap test statistics that I obtain from my code are the same as the ones in the zip files except that all are smaller by a factor of 10. I figure out that this discrepancy disappears when I replace the two Omega matrices in my code with the Omega matrices in the zip files. After using the Frobenius norm to regularize the Sigma matrices in my code (using the norm
function with type = “f”
), I find that my Omega matrices are smaller than the Omega matrices in the zip files by a factor of 100, which seem to coincide with the number of bootstraps. Thus, can I check with you should I multiply my answer by R (or equivalently, multiply it by B, based on the notation in the document)?
[I am focusing DGPs 1 and 2 here only because in DGPs 3 to 6, uses the variance matrix
provided from the user (and adding the zero columns and rows), and is zero. Thus, there is no order of magnitude problems in my results for DGPs 3 – 6 because I am using the variance matrix obtained from betahatobs_vc.out
.]
The replication code for the above two observations can be found here. The code produces two outputs. The first output is the difference between the norm that I tried to reproduce from the DGPs (i.e. multiplied by and the Frobenius norm obtained from the norm
function. The second output is the elementwise division for the Sigma matrix obtained from the DGP and the Sigma matrix obtained from using the formula in the document. To run the code, I think the only thing that has to be changed is to update the working directory as the folder that contains the .out
files for the DGP.
Thank you!
Yes that's right, the norm being used in my code is the standard Euclidean norm treating the matrix as a vector. This is a bug in my code (in the sense that I meant to put what I had in the notes), however it seems to work fine in my simulations, so let's keep it as that.
That's a bug in my code -- I had an additional scaling term that popped up due to a bug in my matrix multiplication. I fixed it, re-verified with your code, and now our Omega matrices match.
Here are new examples attached after I fixed the bug in 2. (They have changed since the last time, but that should be fine, right? Let me know if not.)
Thanks, now we are having test statistics that are the same in the first 7 decimal places for DGP 1 and the first 10 decimal places for DGPs 3 and 4.
For DGP2, I find that the reason for our test statistics that does not match is because we have different beta.star
and their bootstrap counterparts. For DGP1, 3 and 4, we have the same beta.star
.
In calculating the p-values, may I check with you is it done by counting the average number of bootstrap test statistics that are larger than the point estimate? Using this method and the test statistics, the p-value that I have is 0.01 smaller than your p-values for DGPs 1 to 3 and is 0.01 larger than you p-value for DGP 4.
The replication code can be found here. There are two folders in the zip file. The first folder contains the R code and a short note of writing problem (3) in standard form. The second folder contains the R code to replicate the second observation on the p-value. The working directory has to be updated for the both codes. In addition, you would need to update the dgp.id
in the replication file for the p-value as well.
Thank you!
There is no reason to write (3) in standard form. We are assuming that the user has specified their problem in standard form, and that the A matrices and \beta vectors they give us reflect that. However, we do not have to do standard form problems inside the package. You should try to just implement (3) directly as it is, and see if our betastars are still different. Then it will be easier to debug.
This was a bug in my code. Once I fix it, my p-values adjust to match yours.
The reason that I have written problem (3) in standard form is because I had experience of getting an error when I specify the lower bound as -Inf
in solving some other linear programs using gurobi
before. Therefore, I thought it is safer to rewrite everything in the standard form to prevent any surprising results.
Please find the updated code here for solving (3) without writing it in standard form. In this version, the beta.star
that I get is the same as the last version where I wrote the code by transforming problem (3) into standard form. The difference between our beta.star
is between 1.7e-5 and 9.6e-5. (For DGP 1, the difference between our beta.star
is between 1.1e-13 and 4.4e-13). I am not sure if this is due to numerical inaccuracy, and I am reading through my code to see if there is anything wrong. The p-value that I have with my beta.star
and the corresponding bootstrap components is 0.32, which is 0.08 smaller than using the p-value using the beta.star
and the corresponding bootstrap components from the zip file.
Thanks!
The reason that I have written problem (3) in standard form is because I had experience of getting an error when I specify the lower bound as
-Inf
in solving some other linear programs usinggurobi
before. Therefore, I thought it is safer to rewrite everything in the standard form to prevent any surprising results.
There shouldn't be any reason to do that, as Gurobi will do the translation itself. If you ever find a strange thing with Gurobi, you should try to isolate it as a MWE and send it to them.
Please find the updated code here for solving (3) without writing it in standard form. In this version, the
beta.star
that I get is the same as the last version where I wrote the code by transforming problem (3) into standard form. The difference between ourbeta.star
is between 1.7e-5 and 9.6e-5. (For DGP 1, the difference between ourbeta.star
is between 1.1e-13 and 4.4e-13). I am not sure if this is due to numerical inaccuracy, and I am reading through my code to see if there is anything wrong. The p-value that I have with mybeta.star
and the corresponding bootstrap components is 0.32, which is 0.08 smaller than using the p-value using thebeta.star
and the corresponding bootstrap components from the zip file.Thanks!
Normally, I wouldn't be too concerned about a 1e-5 difference. But since it is having a noticeable impact on the p--value, that is concerning.
I briefly looked at both your code and mine, and checked for example the objective vectors. They look very similar, and our xstar outputs are also very similar (but not exact). My code is using Julia --> JuMP --> Gurobi, whereas yours is using R --> Gurobi, so perhaps there is something going on in the symbolic translation JuMP is doing that is affecting things.
Some things that would help debugging: 1) Are our TS's similar or different? How about the bootstrap values? 2) If you replace your betahatstar with mine (both for sample and bootstrap) throughout, how do the TS and p--values compare?
There shouldn't be any reason to do that, as Gurobi will do the translation itself. If you ever find a strange thing with Gurobi, you should try to isolate it as a MWE and send it to them.
I see. I will do so when I encounter something strange next time.
In light of this, I have changed everything in the fsst
code from standard form back to their original form as in the document (i.e. without splitting the variables into its positive and negative components), and I re-ran all my analysis. The p-values and the test statistics that I get in the four DGPs are as follows:
DGP | p-value | cone.n |
range.n |
---|---|---|---|
1 | 0.9 | 0.6135432539 | 0.0.2387698777 |
2 | 0.4 | 0.4979030430 | 1.9828827434 |
3 | 0.83 | 44.5322864124 | 0 |
4 | 0.66 | 49.1959752546 | 0 |
The updated p-values all match with the p-values in the replication file. I think the previous discrepancy in our p-value for DGP2 might be due to writing the other LP/QP in standard form and it caused a mistake somehow. I am trying to locate which part of the code is causing this and why it is causing a problem in the p-value for DGP2 only but not the other DGPs.
Although the p-values match finally in DGP2, our cone.n
and range.n
for DGP2 do not match. So I tried substituting the beta.star
and its bootstrap components from the replication file directly into the code. The updated table becomes:
DGP | p-value | cone.n |
range.n |
---|---|---|---|
1 | 0.9 | 0.6135432523 | 0.2387698914 |
2 | 0.4 | 0.4985108463 | 2.0031053046 |
3 | 0.83 | 44.5322864124 | 0 |
4 | 0.66 | 49.1959752545 | 0 |
The difference in the test statistics for DGPs 1, 3 and 4 are smaller than 1e-7, except for DGP 2, where the change in the test statistics is larger.
Using the beta.star
and its bootstrap components from the replication file, the p-value and the test statistics match the values from the replication file now.
For DGP2, if I use the beta.star
and its bootstrap components from the replication file, the difference between our bootstrap test statistics are between 0 and 0.006 for cone
; and the differences are smaller than 1.01e-13 for range
.
If I am computing the beta.star
and its bootstrap components via R
and Gurobi
, the difference between our bootstrap test statistics are larger. For cone
, they are between 2.29e-5 and 7.71e-3 for cone
. For range
, they are between 0.009 and 0.255.
Ok good, this is progress.
So, just to clarify:
Is that a correct summary?
I just found another discrepancy in DGP1 for its bootstrap cone test statistics although it is not affecting the p-value. This discrepancy exists no matter I am using my beta.star
or your beta.star
. Among the 100 bootstrap cone test statistics, the differences between 62 of them are smaller than 1e-9. For the remaining 38 test statistics, the difference are between 0.002 and 0.24. The differences between our range bootstrap statistics are between 3.2e-10 and 1.05e-5.
I also checked the bootstrap test statistics for DGP3 and DGP4. The differences are smaller than 1e-13 and 3.94e-5 for the cone bootstrap test statistics in the two DGPs respectively. Our range components are the same because they are all zero.
For DGP1, the discrepancy in the bootstrap components for the cone test statistics reduces to below 1.65e-7 when I use the beta.r
from the replication file. For DGP2, the discrepancy is between 9.42e-6 and 7.73e-3 if I use beta.r
from the replication file. If I am using beta.r
, beta.star
and its bootstrap components from the replication file, the discrepancy between the bootstrap cone test statistics are between 1.613-e13 and 1.019e-10.
Therefore, based on this additional information, my answers to your two questions are:
beta.r
. The source of discrepancy in DGP2 is beta.r
and beta.star
.I am looking through the code for beta.r
to see what is the source of the problem.
Thanks!
Ok, very good work debugging. You are using the right strategy! Let me know what you find.
After debugging, I find that the discrepancy in beta.r
is due to a bug in my code, where I put beta.n
instead of beta.star
in the case of d < p. After fixing this, we are having the same p-values, same test statistics (where they match up to the 10 decimal places) and same bootstrap test statistics (differences are all smaller than 1e-10 for DGPs 1 to 3 and smaller than 1e-5 for DGP4) when I am using the beta.star
and its bootstrap components from your replication file.
If I am calculating the beta.star
and its bootstrap components in R
using Gurobi
instead, we are still having the same p-values, although the discrepancy in the test statistics and its bootstrap versions for DGPs 1 and 2 become larger:
fsst
method sets beta.star
and its bootstrap components to be the same as beta.n
and its bootstrap componentsTherefore, after fixing my bug, my modified answer to your earlier question is that we are not agreeing on DGP2, and the source of error is beta.star
and its bootstrap components.
Thanks!
Ok, great work!
The discrepancy on beta.star
is perplexing, but we should try to get to the bottom of it.
Could you send your updated code for computing it?
All of our inputs to the program are the same, right? (The matrices, vectors, etc. in the program definition are just taken from my output.)
So that means the discrepancy must be caused by:
It should be possible for me to pin this down fairly easily.
Here is the latest code that is used to compute beta.star
. Do you only want the code for beta.star
or the full module for the fsst
procedure? I am now updating the comments and the annotations of the entire fsst
code after the recent changes.
I think our input are the same. For this code that computes beta.star
, I am using the following .out
files as the input:
Aobs.out
Ashp.out
Atgt.out
betahatobs.out
betashp.out
betatgt.out
betahatobs_vc.out
I am using the avar
option to compute the weighting matrix, i.e. inverse of the matrix in betahatobs_vc.out
.
Thanks for your help!
Just the beta.star function is fine. I will look into this more closely tomorrow and let you know what I find.
Sure, thanks a lot!
I think both of our code is correct, but that this optimization problem is (numerically) not strictly convex, so that the solution is not unique. Some evidence for this can be seen by passing Method = 1
to Gurobi as an option. The solution (xstar
) changes considerably.
I am talking with Andres Santos about this to see if this is a problem for the theory, and if so any ways to fix it. For now, let's just go with your current code. We may have to make some (small) adjustments later on.
I see. I will go ahead to finalize the current version of the code and push it to GitHub when it's completed. Thank you!
Ok, it looks like this is just a consequence of the quadratic program being essentially weakly convex for numerical purposes. This is not a problem in theory, since what we care about is the uniqueness of $\hat{\beta}^{\star}$, not the solution $\hat{x}^{\star}$. As we saw, we still have some sensitivity in $\hat{\beta}^{\star}$, but it's not that much, and doesn't appear to be empirically relevant (affect the p--value), at least in these examples.
So the plan is to leave this for now and revisit it if it becomes a problem in the future. The underlying problem here is that the $Aobs ' Aobs$ matrix is numerically singular (even though we know it is not actually singular), which means what we are dealing with is very similar to multicollinearity in an OLS framework. Following that logic, we could start removing redundant rows (as in OLS) and potentially stabilize the problem. This seems like much more work than appropriate for now, especially given that we are not sure this is a practical problem.
@conroylau I think our code agrees on everything now except for this issue that we are going to put aside. Please close this when done pushing the new code for the method. Thanks!
Thank you for your detailed explanations! I see why our answers differ slightly in DGP2 now. I am wondering do you want me to include a warning message when $Aobs’ (weight.matrix) Aobs$ is numerically singular, say when the determinant is smaller than a certain value? (In DGP1, it is 4.87e-22. In DGP2, it is -1.56e-32).
I have finished coding the fsst
module (the current version can be found here). But I would like to confirm with you a few items about the module fsst
:
For the parameter $\bar{rho}$ used to normalize the Omega matrices, should I use the sup-norm as in the document or the Frobenius norm as discussed above?
Should I use the same $\bar{rho}$ parameter to normalize the two Omega matrices, or calculate $\bar{rho}$ for each Omega matrix?
Thank you very much!
I don't think it is necessary to have the warning.
Let's use the Frobenius norm as above.
Calculate for each matrix.
Under 1. and 2, our results for these matrices matched, right?
Yes, our results for the matrices matches under 1 and 2!
Ok good, let's stick with those choices then.
Sure, I have pushed the code for the new fsst
testing procedure to GitHub. Thank you!
This is the long-promised new method that I have been working on w/ coauthors.
This note describes how to implement it. fsst-module.pdf Let me know if you have any questions!
And I have included 6 example DGPs that you can try to replicate. fsst-examples.zip (And let me know if I forgot anything in these that might be helpful.)