Code error when using not yet treated as controls

dadepro commented 12 months ago

When I estimate the aggregated ATE using employment rate data and change the control group to notyettreated, the code gives an error:

import pandas as pd
from csdid.att_gt import ATTgt

 data = pd.read_csv("https://raw.githubusercontent.com/d2cml-ai/csdid/function-aggte/data/mpdta.csv")
 out = ATTgt(yname = "lemp",
              gname = "first.treat",
              idname = "countyreal",
              tname = "year",
              xformla = f"lemp~1",
              control_group = "notyettreated",
              data = data,
              ).fit(est_method = 'dr')
out.aggte(typec='simple')

Error:

compute_att_gt.py", line 83, in compute_att_gt
    n3 = np.where(data[gname] != glist[g], True, False)
IndexError: index 2004 is out of bounds for axis 0 with size 3`

alexanderquispe commented 2 weeks ago

Hi @dadepro! Thanks a lot for posting this. There was an issue with the indexation for treatment groups when control_group = "notyettreated"

`

 for _, g, in enumerate(glist): - old
 for g_index, g in enumerate(glist): - new

g = glist[1]
G_main = (data[gname] == g)
data = data.assign(G_m = 1 * G_main)

for t_i in range(tlist_len):
  pret = t_i
  tn = tlist[t_i + tfac]
  if base_period == 'universal' or g < tn:
    try:
      pret = np.where(tlist + anticipation < g)[0][-1]
    except:
      raise f"There are no pre-treatment periods for the group first treated at {g}\nUnits from this group are dropped"
      # break

  if base_period == 'universal':
    if pret == tn:
      add_att_data()

  if not never_treated:
    n1 = data[gname] == 0
    n2 = (data[gname] > (tlist[np.max([t_i, pret]) + tfac]) + anticipation)
    #n3 = np.where(data[gname] != glist[g], True, False) -- old
    n3 = np.where(data[gname] != glist[g_index], True, False)-- new`

I have run the equation with the updated package

`import pandas as pd from csdid.att_gt import ATTgt

data = pd.read_csv("https://raw.githubusercontent.com/d2cml-ai/csdid/function-aggte/data/mpdta.csv") out = ATTgt(yname = "lemp", gname = "first.treat", idname = "countyreal", tname = "year", xformla = f"lemp~1", control_group = "notyettreated", data = data, ).fit(est_method = 'dr') out.summ_attgt().summary2`

exactly the same as the R output. Please instal the github version to test this out :)

alexanderquispe commented 2 weeks ago

@pedrohcgs I think we can close this issue :)

d2cml-ai / csdid

Code error when using not yet treated as controls #18