friosavila / stpackages

Repository for all my Stata packages
MIT License
11 stars 9 forks source link

e(sample) not implemented in csdid2? #4

Closed avila closed 2 weeks ago

avila commented 5 months ago

I was moving my analysis to csdid2 and was very happy with the gain in speed. What was taking half a minute, now is done in 3 or 5 seconds. Quite impressive.

One thing that I am missing is the e(sample) functionality. Currently, (*! v1.21) e(sample) returns only 0. in csdid (*! v1.72 by FRA. Drops always treated) it works as expected.

image

Is that something that cannot be implemented due to the mata usage or just a bug? If I can be helpful in debugging it, please let me know how.

Best, Marcelo

Replication syntax

frause mpdta, clear

replace lemp = . if year==2006 & countyreal==55137 /* just to make sure sample is not full */

qui csdid2 lemp, ivar(countyreal) tvar(year) gvar(first_treat)   
gen smp2 = e(sample)
fre smp2 

qui csdid lemp, ivar(countyreal) time(year) gvar(first_treat)   
gen smp1 = e(sample)
fre smp1 
friosavila commented 5 months ago

Hi Marcelo you are quite correct. e(sample) does not store anything after csdid2. It is not on purpose, but by design. I have not yet figured out what would be the best way to keep track of the samples and subsamples that are used in CS. For CSDID its easy, because each observation is a row, and thus the influence functions, samples, etc, are easy too keep track of. in csdid2, everything is just separate datasets, with no easy way that i came up with to keep track of the Sample. Fernando

On Sat, Jan 13, 2024 at 12:12 PM Marcelo R. Avila @.***> wrote:

I was moving my analysis to csdid2 and was very happy with the gain in speed. What was taking half a minute, now is done in 3 or 5 seconds. Quite impressive.

One thing that I am missing is the e(sample) functionality. Currently, (! v1.21) e(sample) returns only 0. in csdid (! v1.72 by FRA. Drops always treated) it works as expected.

image.png (view on web) https://github.com/friosavila/stpackages/assets/31955632/16605e07-615f-4624-9aec-5f3adaf723d7

Is that something that cannot be implemented due to the mata usage or just a bug? If I can be helpful in debugging it, please let me know how.

Best, Marcelo Replication syntax

frause mpdta, clear replace lemp = . if year==2006 & countyreal==55137 / just to make sure sample is not full /

qui csdid2 lemp, ivar(countyreal) tvar(year) gvar(first_treat) gen smp2 = e(sample) fre smp2

qui csdid lemp, ivar(countyreal) time(year) gvar(first_treat) gen smp1 = e(sample) fre smp1

— Reply to this email directly, view it on GitHub https://github.com/friosavila/stpackages/issues/4, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASZKKFQL4X7L67BBAFJ4JF3YOK6BBAVCNFSM6AAAAABBZOJKF6VHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4DAMZZGY4TOMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

avila commented 5 months ago

Thank you for your quick answer. So there is currently no (easy) way of knowing the number of observations used in the estimation with csdid2?

By the way, as a sidenote, I first got a little confused with the versioning of csdid2, since they are still in version 1.xx.

. which csdid2
/home/avila/ado/plus/c/csdid2.ado
*! v1.21 Allows for treatvar
*! v1.2  Allows for Anticipation
...

Might be more intuitive to bump the version to 2.x.

Again, thank you for your attention to this and maintaining such packages!