bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
287 stars 91 forks source link

Possible to compute average ATT(g,t) for custom list of pairs g-t pairs? #151

Open jtorcasso opened 1 year ago

jtorcasso commented 1 year ago

Sometimes I'd like to break out impacts by collections of groups (multiple, but not all, values of g) for a fixed range of event time. Is it possible to do this in the did package? For instance, suppose I wanted the average ATT(g,t) for g,t in [(g1,t1),(g1,t2),(g2,t2),(g2,t3)].

bcallaway11 commented 1 year ago

We can do this directly for some cases (e.g., you can check the balance_e argument of aggte). If you want to include fully custom groups, that might take some more work but could be possible, I think.

Brant

jtorcasso commented 1 year ago

I see. That makes a lot of sense. May be useful to add an example like this in the docs somewhere. Let me walk through the logic. So suppose I have monthly data from 2019-01 through 2022-12 with staggered treatment taking place over the last three years (2020, 2021 and 2022). I want to get the average effect for months 7 through 12 in event time for only those units who got treated in 2021. Suppose MP is the output from att_gt where I run it on a dataset that includes only never treated and those units with a launch year >= 2021. I think I would then specify

aggte(MP, type="dynamic", balance_e=13, min_e=7, max_e=12)

The sample restriction removes launches earlier than 2021. The balance_e=13 argument ensures I have at least 13 months of post treatment data, which excludes the units treated in 2022 (which would have, at most, 12 months of post treatment data).

jtorcasso commented 1 year ago

Actually. This doesn't work. If I have to set balance_e=9 and I still want the average from 7 through 12, aggte will cap the event time to 9, not 12. So something is restricting max_e <=balance_e.

jtorcasso commented 1 year ago

I think the problem is on line 342 of compute.aggte (last line within if statement below):

    # if we balance the sample with resepect to event time
    if (!is.null(balance_e)) {
      include.balanced.gt <- (t2orig(maxT) - originalgroup >= balance_e)

      eseq <- unique(originalt[include.balanced.gt] - originalgroup[include.balanced.gt])
      eseq <- eseq[order(eseq)]

      eseq <- eseq[ (eseq <= balance_e) & (eseq >= balance_e - t2orig(maxT) + t2orig(1))]

    }

If I'm interpreting this right, the event times are restricted to be less than balance_e. Is this necessary? The docs for balance_e say: "For example, if balance.e=2, aggte will drop groups that are not exposed to treatment for at least three periods." It doesn't say that it also restricts the estimated impacts to be for event times less than balance_e.

bcallaway11 commented 1 year ago

Yes, what you are saying here makes sense to me. The intended use of balance_e is to have the compositions of groups be the same across different event times. This is why it truncates the event times to be those $<=$ whatever the value of balance_e is as, otherwise, the composition will change.

Does this line: aggte(MP, type="dynamic", balance_e=13, min_e=7, max_e=12) work as expected?

I think the second case you are talking about will probably require some kind of custom wrapper on top of the results from att_gt.