conroylau / lpinfer

lpinfer: An R Package for Inference in Linear Programs
GNU General Public License v3.0
3 stars 5 forks source link

Print and summary for fsst #26

Closed a-torgovitsky closed 4 years ago

a-torgovitsky commented 4 years ago

Let's make print shorter, and summary longer.

Specifically, for print, just report the p-value, or list of p-values if there are multiple lambdas.

For summary, report everything that is there now, as well as: 1) Quantiles (across bootstraps) of the range, cone, and bootstrapped test statistic for standard levels: .90, .95, .99. Note that the cone and test statistic will vary with lambda, but the range will not. This can come after test statistic, and before p-values. 2) The input parameter rho, as well as the rhobar values used for the studentization matrices. If d >= p, then report NA for the "E" matrix, since this is not used in this case. 3) How the betaobs variance matrix was estimated (from a list passed, from bootstrapping, from a function).

conroylau commented 4 years ago

Updated the module just now. A sample output is as follows: For print, it is showing:

p-values:
     lambda p-value
     0.5    0.9
     0.1    0.92
     0.05   0.93
     0.01   0.95
     0.001  0.95

For summary it is showing:

Test statistic: 0.61354.
   - Range component: 0.23877
   - Cone component: 0.61354

Quantiles of bootstrap test statistics: 
     lambda      0.5     0.1    0.05    0.01   0.001
     90%     1.62574 1.62574 1.62574 1.62574 1.62574
     95%     1.92395 1.92395 1.92395 1.92395 1.92395
     99%      2.5875  2.5875  2.5875  2.5875  2.5875

Quantiles of bootstrap cone and range components: 
                 cone   range   range   range   range   range
     (lambda)           (0.5)   (0.1)  (0.05)  (0.01) (0.001)
     90%      1.62574 0.95209 1.05593 1.07169 1.11015  1.1134
     95%      1.92395 1.37134   1.532 1.58983 1.59973 1.60427
     99%       2.5875 1.78371 2.10209 2.19344 2.26651 2.28371

p-values:
     lambda  0.5  0.1 0.05 0.01 0.001
     p-value 0.9 0.92 0.93 0.95  0.95

Solver used: gurobi

Number of cores used: 1

Regularization parameters: 
   - Input value of rho: 1e-04
   - Regaularization parameter for omega.e: 0.00015
   - Regaularization parameter for omega.i: 0.00014

The asymtotic variance of observed component of the 'beta.obs' vector is approximated from the list.

Do you think the current output for the summary is fine? I am thinking whether I should merge the two tables for the test statistics and the bootstrap components (but I cannot think of a clean way of doing that yet).

Thank you!

a-torgovitsky commented 4 years ago

It should be the cone component that changes with lambda, right? Also, let's make it ordered in lambda from left to right.

Anyway, here's an idea for the table:

TS Range Cone (.001) Cone (.01) Cone(05) ...
Sample .613 .238 .613 .613 .613
BS 99% CV .2578 fill fill fill fill fill
BS 95% CV fill fill fill fill fill fill

Other than that looks fine for now (we may revisit later) There are a few typos though at the end (Regaularization, asymtotic), and you should cahnge "omega.e" to "Range studentization matrix" and "omega.i" to "Cone studentization matrix"

conroylau commented 4 years ago

Yes, I have mixed up Cone and Range in the last version. I have updated the message and fixed the typos. The updated summary message is as follows:

Sample and quantiles of bootstrap test statistics: 
     lambda                 0.001    0.01    0.05     0.1     0.5
     Sample               0.61354 0.61354 0.61354 0.61354 0.61354
     Bootstrap 99% CV     2.58750 2.58750 2.58750 2.58750 2.58750
     Bootstrap 95% CV     1.92395 1.92395 1.92395 1.92395 1.92395
     Bootstrap 90% CV     1.62574 1.62574 1.62574 1.62574 1.62574

Sample and quantiles of bootstrap cone and range components: 
                            Range Cone (0.001) Cone (0.01) Cone (0.05) Cone (0.1) Cone (0.5)
     Sample               0.23877      0.61354     0.61354     0.61354    0.61354    0.61354
     Bootstrap 99% CV     2.58750      2.28371     2.26651     2.19344    2.10209    1.78371
     Bootstrap 95% CV     1.92395      1.60427     1.59973     1.58983    1.53200    1.37134
     Bootstrap 90% CV     1.62574      1.11340     1.11015     1.07169    1.05593    0.95209

p-values:
     lambda     0.001 0.01 0.05  0.1 0.5
     p-value     0.95 0.95 0.93 0.92 0.9

Solver used: gurobi

Number of cores used: 1

Regularization parameters: 
   - Input value of rho: 1e-04
   - Regularization parameter for the Range studentization matrix: 0.00015
   - Regularization parameter for the Cone studentization matrix: 0.00014

The asymptotic variance of the observed component of the beta vector is approximated from the list.

Regarding the column on the test statistic, may I know that should I keep it as a separating table (as above) or putting it in the first column as in your demonstration? I am thinking because the test statistic is defined as the maximum of the Cone and Range components, and the bootstrap Cone component is dependent on lambda, the bootstrap test statistic for might be different for different lambdas. Thus, may I know would it be better if I keep it separately?

Thank you!

a-torgovitsky commented 4 years ago

Hmm yes, that's very annoying isn't it! It would be great to get this information into a nice compact form without too much duplication. Let me think about it a bit more...

conroylau commented 4 years ago

I have two ideas on how to present the table, but I am not sure if they are suitable or not.

My first idea is to print the sample and bootstrap Cone and Range components; and place a star next to the Cone component if it is used as the test statistic for a particular lambda and quantile (or the sample). If a star is not placed beside the Cone component, it means the Range component is used as the test statistic. A star will not be attached to the Range component because the same Range is used across different lambdas. Using this convention, then the output would be the same as the sample table that I have above for the Cone and Range components because the Range components are larger than all the Cone components in that particular example.

My second idea is to have a table like the following, which prints out the information in one table.

lambda 0.001 0.01 0.1 0.5 ...
Sample Cone ... ... ... ... ...
Range ... ... ... ... ...
TS ... ... ... ... ...
Bootstrap 99% CV Cone ... ... ... ... ...
Range ... ... ... ... ...
TS ... ... ... ... ...
Bootstrap 95% CV Cone ... ... ... ... ...
Range ... ... ... ... ...
TS ... ... ... ... ...
Bootstrap 90% CV Cone ... ... ... ... ...
Range ... ... ... ... ...
TS ... ... ... ... ...

I am still thinking of a better way to present the information because I am not sure if the table from the first idea might not be so clear and if the second one is duplicating values. May I know what do you think about the above? Thanks!

a-torgovitsky commented 4 years ago

I think the second one is good, but to reduce duplication we can group by TS/Cone/Range first, then by Sample/Bootstrap 99%/Bootstrap 95%/Bootstrap 90% second. Here's a sketch

IMG_20200509_154402

conroylau commented 4 years ago

Done! The updated summary message is as follows:

Sample and quantiles of bootstrap test statistics: 
                              lambda     0.001    0.01    0.05     0.1     0.5
    Test statistic            Sample   0.61354 0.61354 0.61354 0.61354 0.61354
                    Bootstrap 99% CV   2.58750  2.5875  2.5875  2.5875  2.5875
                    Bootstrap 95% CV   1.92395 1.92395 1.92395 1.92395 1.92395
                    Bootstrap 90% CV   1.62574 1.62574 1.62574 1.62574 1.62574
              Cone            Sample   0.61354 0.61354 0.61354 0.61354 0.61354
                    Bootstrap 99% CV   2.28371 2.26651 2.19344 2.10209 1.78371
                    Bootstrap 95% CV   1.60427 1.59973 1.58983   1.532 1.37134
                    Bootstrap 90% CV   1.11340 1.11015 1.07169 1.05593 0.95209
             Range            Sample   0.23877                                
                    Bootstrap 99% CV   2.58750                                
                    Bootstrap 95% CV   1.92395                                
                    Bootstrap 90% CV   1.62574                                

p-values:
     lambda     0.001 0.01 0.05  0.1 0.5
     p-value     0.95 0.95 0.93 0.92 0.9

Solver used: gurobi

Number of cores used: 1

Regularization parameters: 
   - Input value of rho: 1e-04
   - Regularization parameter for the Range studentization matrix: 0.00015
   - Regularization parameter for the Cone studentization matrix: 0.00014

The asymptotic variance of the observed component of the beta vector is approximated from the list.

Thanks!