Open shalakawani opened 1 month ago
Hi,
I am trying to understand the updated package and running it against a simulated data as follows:
out = ATTgt(yname='outcome', gname='treatment_month', idname='seller_id', tname='month', allow_unbalanced_panel=True, xformla=f'outcome~1', control_group='never_treated', data=panel_data.reset_index() ).fit(est_method='dr')
I am getting a following error:
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[8], line 9 1 out = ATTgt(yname='outcome', 2 gname='treatment_month', 3 idname='seller_id', 4 tname='month', 5 allow_unbalanced_panel=True, 6 xformla=f'outcome~1', 7 control_group='never_treated', 8 data=panel_data.reset_index() ----> 9 ).fit(est_method='dr') File ~\AppData\Roaming\Python\Python311\site-packages\csdid\att_gt.py:39, in ATTgt.fit(self, est_method, base_period, bstrap) 36 def fit(self, est_method = 'dr', base_period = 'varying', bstrap = True): 37 # print(self.dp) 38 dp = self.dp ---> 39 result, inffunc = compute_att_gt(dp) 40 att = result['att'] 41 crit_val, se, V = np.zeros(len(att)), np.zeros(len(att)), np.zeros(len(att)) File ~\AppData\Roaming\Python\Python311\site-packages\csdid\attgt_fnc\compute_att_gt.py:83, in compute_att_gt(dp, est_method, base_period) 81 n1 = data[gname] == 0 82 n2 = (data[gname] > (tlist[np.max([t_i, pret]) + tfac]) + anticipation) ---> 83 n3 = np.where(data[gname] != glist[g], True, False) 84 row_eval = n1 | n2 & n3 85 data = data.assign(C = 1 * row_eval) IndexError: index 15 is out of bounds for axis 0 with size 4
I tried changing control group definition and other parameters but still got the same error.
I am attaching a code to generate simulated data:
# Set seed for reproducibility np.random.seed(51) # Define parameters num_months = 36 num_sellers = 100 mean_outcome = 10 noise_level = 0.5 treatment_effect = 2 treatment_noise_level = 0.2 treatment_months = [15, 18, 21, 24] cohort_sizes = [15, 15, 15, 15] # Initialize data structure data = [] # Generate seller data for seller_id in range(1, num_sellers + 1): # Randomly assign join month between 1 and 12 join_month = np.random.randint(1, 13) # Randomly decide if seller leaves between month 24 and 36 leave_month = np.random.choice([0] + list(range(24, 37)), p=[0.75] + [0.25 / 13] * 13) # Default outcome with noise outcome = mean_outcome + np.random.normal(0, noise_level, num_months) # Treatment assignment if seller_id <= 40: treatment_status = 0 treatment_month = np.nan else: cumulative_sizes = np.cumsum(cohort_sizes) if seller_id <= 40 + cumulative_sizes[0]: cohort = 0 elif seller_id <= 40 + cumulative_sizes[1]: cohort = 1 elif seller_id <= 40 + cumulative_sizes[2]: cohort = 2 else: cohort = 3 treatment_month = treatment_months[cohort] treatment_status = 1 # Apply treatment effect after treatment month outcome[treatment_month - 1:] += treatment_effect + np.random.uniform(-treatment_noise_level, treatment_noise_level, num_months - treatment_month + 1) # Create panel data for each month for month in range(1, num_months + 1): if month >= join_month and (leave_month == 0 or month < leave_month): data.append([seller_id, month, outcome[month - 1], treatment_status, treatment_month]) # Create DataFrame columns = ['seller_id', 'month', 'outcome', 'treatment_status', 'treatment_month'] panel_data = pd.DataFrame(data, columns=columns) panel_data['seller_id'] = panel_data['seller_id'].astype(str) panel_data = panel_data.set_index(keys=['seller_id', 'month']) panel_data = panel_data.reset_index() panel_data = panel_data.sort_values(by=['seller_id', 'month'], ascending=False) panel_data['outcome12'] = panel_data.groupby('seller_id')['outcome'].transform(lambda x: x.rolling(12).sum()) panel_data['outcome12'] = panel_data['outcome12'].fillna(panel_data.groupby('seller_id')['outcome12'].transform('mean')) panel_data = panel_data.set_index(keys=['seller_id', 'month']) #Specifically for CSDID panel_data = panel_data.reset_index() panel_data['seller_id'] = panel_data['seller_id'].astype('int64') panel_data['treatment_month'].fillna(0, inplace=True) panel_data['treatment_month'] = panel_data['treatment_month'].astype('int64')
I also tried the code with the data provided in the repository example, and it works perfectly fine. My simulated data looks exactly like the data you provided. I will really really appreciate your help to debug the issue here.
Thank you.
Hi,
I am trying to understand the updated package and running it against a simulated data as follows:
I am getting a following error:
I tried changing control group definition and other parameters but still got the same error.
I am attaching a code to generate simulated data:
I also tried the code with the data provided in the repository example, and it works perfectly fine. My simulated data looks exactly like the data you provided. I will really really appreciate your help to debug the issue here.
Thank you.