bcallaway11 / did

Difference in Differences with Multiple Periods, website: https://bcallaway11.github.io/did
307 stars 96 forks source link

Are there assumptions for outcomes used in DiD package? #206

Closed 49470952 closed 4 weeks ago

49470952 commented 2 months ago

Hi,

I have a question about the general requirements for outcomes variable used in att_gt(). I noticed the DiD package used regression model with iptw method for ATT estimation, my question is if it requires the outcome variable specified in 'y name' to be normal or other parametric distribution?

Thanks very much!

Lulu

bcallaway11 commented 1 month ago

Hi Lulu, no the outcome does not need to be normally distributed or follow any parametric distribution.

Brant

49470952 commented 1 month ago

Will the ATT estimation be more accurate for the normally distributed outcome than the non-normal distribution using DiD package? Asking this because typically, the regression model fitness is much better when the outcome variable is normal. A typical DiD model (Y= β0 + β1[Time] + β2[Intervention] + β3[TimeIntervention] + β4*[Covariates]+ε 1) would also require outcome variable to be parametric. Could you please explain why att_gt() doesn't need the outcome to be normal, and how was the non-normality accounted in this model? Thanks in advance.

bcallaway11 commented 1 month ago

Ah, you don't need parametric assumptions on the outcome in our case or in the regression that you mentioned, as long as you have enough data. Though, like you mention, you may get more "accurate" estimates / better finite sample properties of inference procedures, etc. if the outcome is normally distributed compared to other cases when it follows a more complicated distribution.

Hope this helps!

Brant

pedrohcgs commented 1 month ago

Our model is semiparametric and does not rely on distributional assumptions in the outcome model.

If you impose a parametric linear model, and impose the very strong assumption that error terms are homoskedastic and have zero autocorrelation, you can do better because effective sample size increases as you are impose much much stronger conditions. We do recommend and do not consider these setups.


Pedro H. C. Sant'Anna https://psantanna.com https://psantanna.com


Warning: This email may contain confidential or privileged information intended only for the use of the individual or entity to whom it is addressed. If you are not the intended recipient, please understand that any disclosure, copying, distribution, or use of the contents of this email is strictly prohibited.

On Wed, Sep 11, 2024 at 20:23 bcallaway11 @.***> wrote:

Ah, you don't need parametric assumptions on the outcome in our case or in the regression that you mentioned, as long as you have enough data. Though, like you mention, you may get more "accurate" estimates / better finite sample properties of inference procedures, etc. if the outcome is normally distributed compared to other cases when it follows a more complicated distribution.

Hope this helps!

Brant

— Reply to this email directly, view it on GitHub https://github.com/bcallaway11/did/issues/206#issuecomment-2345017437, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABE73452KCBPFVEHWKN7CEDZWDNGZAVCNFSM6AAAAABNSVTGAWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNBVGAYTONBTG4 . You are receiving this because you are subscribed to this thread.Message ID: @.***>