Closed achinmay17 closed 10 months ago
Hi,
I can't reproduce the error (see below for working examples)
Is there any way you could share a bit more information regarding the time
and cohort
columns? You can check the processed data (after initializing ATTgt) using (in your instance) att_gt.data
and make sure the columns are correctly specified.
It seems you are using monthly data and taking advantage of the freq
argument which casts the datetimes to integers, thus the time
and cohort
columns should be integer data types in your instance data attribute.
If the error arises when casting to integers using the freq
argument you could use integers time
and cohort
columns by casting those yourself.
One way to see how the functionality works with datetime columns is to try the following code, which runs fine for me:
from differences import simulate_data, ATTgt
panel_data = (
simulate_data(datetime=True) # yearly data where the date and cohort are datetimes
.sample(frac=0.9) # making it unbalanced
)
att_gt = ATTgt(data=panel_data, cohort_name='cohort', base_period="varying", freq="YS")
print(att_gt.is_balanced_panel)
att_gt.fit("y ~ x0", est_method='dr', control_group="not_yet_treated", progress_bar=True)
import pandas as pd
from differences import simulate_data, ATTgt
to_month = {
1900: "1900-01-01",
1901: "1900-02-01",
1902: "1900-03-01",
1903: "1900-04-01",
1904: "1900-05-01",
1905: "1900-06-01",
1906: "1900-07-01",
1907: "1900-08-01",
1908: "1900-09-01",
1909: "1900-10-01",
1910: "1900-11-01",
}
panel_data = (
simulate_data()
.sample(frac=0.9)
.reset_index()
.assign(
time=lambda x: pd.to_datetime(x["time"].map(to_month)),
cohort=lambda x: pd.to_datetime(x["cohort"].map(to_month)),
)
.set_index(["entity", "time"])
)
att_gt = ATTgt(data=panel_data, cohort_name='cohort', base_period="varying", freq="MS")
print(att_gt.is_balanced_panel)
att_gt.fit("y ~ x0", est_method='dr', control_group="not_yet_treated", progress_bar=True)
I could find out the error using the debugging method. There was some duplication on the id level. Thanks for the prompt response!
Hi, I am trying to run Doubly Robust S-DID with unbalanced panel and varying base period. the control group is 'not_yet_treated' My code is as following:
however, I am getting following error which I am not able to understand
The 'course_month_end_date' column does exists in the dataframe. I will really appreciate your help in debugging this.