alexanderquispe / 14.38_Causal_ML

JupyterNotebook for the MIT course
MIT License
13 stars 5 forks source link

PM2-B #30

Open alexanderquispe opened 3 years ago

alexanderquispe commented 3 years ago
  1. Inference on Predictive and Causal Effects in High Dimensional Linear Regression Models
alexanderquispe commented 3 years ago
alexanderquispe commented 3 years ago

*python-notebook-experiment-on-orthogonal-learning I found that the iteration process may have an error. image

I think that this line: if sum(SX_IDs) == 0 : Naive[ 0 ] = sm.OLS( Y , sm.add_constant(D) ).fit().summary2().tables[1].round(3).iloc[ 1, 0 ]

should be replaced by

if sum(SX_IDs) == 0 : Naive[ i ] = sm.OLS( Y , sm.add_constant(D) ).fit().summary2().tables[1].round(3).iloc[ 1, 0 ]

Otherwise, the iteration is not saving the results for that regression. Also when I checked the results of the naive matrix you got and the naive matrix from the RNotebook the mean is totally different. But when I change this the results seem to converge. @anzonyqr , let me know if this makes sense to you. Thanks!

alexanderquispe commented 3 years ago

This is what I found in the R code : image This is what we have in the Python code: image

I will divide by 4 the error term. And also modify the iterator as I did in the code above.

anzonyquispe commented 3 years ago

*python-notebook-experiment-on-orthogonal-learning I found that the iteration process may have an error. image

I think that this line: if sum(SX_IDs) == 0 : Naive[ 0 ] = sm.OLS( Y , sm.add_constant(D) ).fit().summary2().tables[1].round(3).iloc[ 1, 0 ]

should be replaced by

if sum(SX_IDs) == 0 : Naive[ i ] = sm.OLS( Y , sm.add_constant(D) ).fit().summary2().tables[1].round(3).iloc[ 1, 0 ]

Otherwise, the iteration is not saving the results for that regression. Also when I checked the results of the naive matrix you got and the naive matrix from the RNotebook the mean is totally different. But when I change this the results seem to converge. @anzonyqr , let me know if this makes sense to you. Thanks!

You are right. I made a mistake. Zero should be change for i otherwise Naive will not save results.

anzonyquispe commented 3 years ago

This is what I found in the R code : image This is what we have in the Python code: image

I will divide by 4 the error term. And also modify the iterator as I did in the code above.

You are right. I have already made corrections. Now, the python script is equal to r code.

SandraMartinezGutierrez commented 2 years ago

[JULIA SCRIPT] When running OLS regression, there is a problem with the intercepts due to the Cholesky factorization. image

Found a solution here: 1) https://github.com/JuliaStats/StatsModels.jl/issues/31 2) https://github.com/JuliaStats/GLM.jl/issues/426

Final output: image

Notice that to iterate over columns from an specific data, I used:

term(:y) ~ sum(term.(names(data[!, Not(["y", "intercept"])]))

SandraMartinezGutierrez commented 2 years ago

[JULIA SCRIPT]

In this section of the Python script, I was looking for a function like .set_index() in python at Julia dataframe. image

I've found out NamedArray can give similar result with .set_index() in Python as below:

image

However, NamedArray returns as matrix format only, and I could not find a function to be used in Julia dataframe. For this reason, it is recommended to add the "index information" as a column in the dataframe. Then, it is possible to make a groupby to the data by that column to have a quick lookup.

Documentation about NamedArrays can be found here: 1) https://github.com/davidavdav/NamedArrays.jl

Final output: image

SandraMartinezGutierrez commented 2 years ago

[JULIA SCRIPT]

In the Python script, when trying to convert this table to HTM, it vas only necessary to add "to_html()".

image

However, in Julia, I found out an interesting way to convert tables to HTML.

For more information: 1) https://ronisbr.github.io/PrettyTables.jl/stable/ 2) https://github.com/ronisbr/PrettyTables.jl

Final output: image

SandraMartinezGutierrez commented 2 years ago
  • pm2_notebook_jannis In this JN we use the double Lassso regression. Again we face the problem of equivalent "alphas" I used alpha = 0.00077 (manually chosen) as the best proxy to get similar results as in the RN Coefficients are similar but CI are not equal.

[JULIA SCRIPT]

Julia_pm2_notebook_jannis In this notebook, when using the double Lassso regression, we face the problem of equivalent "alphas" I used alpha = 0..8 (manually chosen) as the best proxy to get similar results as in the RN Coefficients are similar but CI are not equal.

R Notebook:: image

Julia Notebook: image