Closed alexanderquispe closed 1 month ago
The error was found in the reg_panel
function and has been resolved in the file reg_did.py. The resolution involved creating functions with numpy arrays in flatten
format and using np.new_axis
as recommended by numpy. This adjustment was made for the callbacks reg_panel
and reg_rc
. Additionally, the object dp
did not extract dp['panel']
within the file compute_att_gt.py
, which caused it to always be false and led to the use of dr_rc
and reg_did_rc
functions instead.
To obtain the standard error without bootstrap (bstrap=False
), using the formula sd(inf_function) / sqrt(len(inf_function))
in Python, you can use:
n_len = list(map(len, inffunc))
np.std(inffunc, axis=1) / np.sqrt(n_len)
Comparing the results between R and Python, they are not completely identical for the following output with a 3-element array:
> sd(c(1, 3, 4))
# [1] 1.527525
np.std([1, 3, 4])
# 1.247219128924647
After correcting the errors, the following output is achieved:
!pip uninstall csdid DRDID -y
pip install git+https://github.com/d2cml-ai/DRDID
pip install git+https://github.com/d2cml-ai/CSDID
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import statsmodels.api as sm
lm = sm.WLS
from csdid.att_gt import ATTgt
df = pd.read_csv('../data/r_cohort.csv')
from drdid.reg_did import reg_did_panel
out = ATTgt(yname="ln_gms", gname="first.treat", idname="customer_id_num",
tname="time_period", data=df).fit(est_method=reg_did_panel, bstrap=False)
# Display the results
out.summ_attgt().summary2
Hi @alexanderquispe
I realized the same, when cross-checking the Python website & R package webste ; in the introduction example the differences are quite substantive
Is the current Python docu using the fixed version (according to this issue) or will this be updated?
I was a bit confused when I replicated the python example... I think the example is based on a way smaller sample (the dta
data frame has 320 rows), which is of course way smaller than the sample size of the R example with 15916 rows... So maybe the differences only come from this, but I think it's hard to see... Thanks!
Thanks for the great work!
Hi @PhilippBach Thanks a lot for this issue. I realized that the dataset used in the python tutorial was only a sample. So I uploaded the correct dataset
https://raw.githubusercontent.com/d2cml-ai/csdid/main/data/sim_data.csv
I was a bit concern that the package was not working :)
R package
Python package - updated dataset
Please if you can rerun our package and confirm that everything is ok, we will appreciate it. @pedrohcgs
Hi @alexanderquispe
thanks for your response! The results look virtually identical 👍
From @pedrohcgs :
But I think some of the codes are not matching R.
I am attaching the code in R above, which give me the following results: Group Time ATT(g,t) Std. Error [95% Pointwise Conf. Band]
2 2 -0.0746 0.0293 -0.132 -0.0173 *
When I run the equivalent code in python, I get this: Group Time ATT(g, t) Post Std. Error [95% Pointwise Conf. Band] 0 2 2 -0.0746 1 0.0564 -0.1909 0.0416
The difference in std errors is pretty huge!
This is the Python code I used: