bernardodionisi / differences

difference-in-differences in Python
https://bernardodionisi.github.io/differences/latest/
GNU General Public License v3.0
89 stars 18 forks source link

Cannot run TWFE example #10

Open amichuda opened 10 months ago

amichuda commented 10 months ago

I'm running differences 0.1.2,

Running the TWFE example from the README doesn't work:

from differences import TWFE, simulate_data

df = simulate_data()

twfe = TWFE(data=df, cohort_name='cohort')

twfe.fit(formula='y')

It errors with:

ValueError: Only numeric, string  or categorical data permitted

Upon debugging, I found that the issue happens when the data is fed into linearmodels.AbsorbingLS and self._y is cast to an object type because it has numbers as well as booleans.

bernardodionisi commented 10 months ago

Hi Aleks,

nice to hear from you!

With a very quick look, I can't reproduce the error, here is the output I get from installing the release version 0.1.2 in a Google colab (note that your estimates will be different, since, as I remember, there is no seed argument in simulate_data(), so the data used for the estimation will be different at each run).

Screenshot 2023-10-20 at 7 04 56 AM

In the data returned by simulate_data(), 'y' should be a float64, not sure when would it become an object.


At some point, I will take a closer look, but it has been a very busy period and it will be for a bit longer. If I had to guess I need to update/manage dependencies a bit better, some behavior may have changed since I posted the package.

FYI, I think there may be some things in the TWFE class that do not behave as one would expect so will need to change those. However the benefit of having a TWFE class is the additional control you get in creating the relevant variables, the flexibility may not be apparent because the documentation is not great at the moment.

If you can't get this running, instead of fitting the ols using the TWFE class, you could use the method (in TWFE) called ._get_relative_periods() (tweaking the arguments at your preference) to create the variables you need from your data, which includes basic relative periods from the treatment but also binned endpoints, weighted by intensity, for multiple treatments per entity or combinations of the above. Once you have those variables then you can use the regression software of your choice to fit the model.

In any case, thanks for taking the time to report the issue. I will let you know when I will be able to take a closer look.