bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
https://bashtage.github.io/linearmodels/
University of Illinois/NCSA Open Source License
943 stars 184 forks source link

Float division by zero when fitting a model #269

Closed madinajapakhova closed 4 years ago

madinajapakhova commented 4 years ago

Greetings!

I am replicating a paper, and need to run a simple OLS regression with a fixed effect. To do that I am running the PanelOLS function from linearmodels. I'm estimating the effect of treatment on child mortality, here is the raw data:
AEJ2018_child_mortality_computation.zip

Here is what I do on the dataset: data_2 = pd.read_stata("AEJ2018_child_mortality_computation.dta") Collapsing to the sum: data_2 = data_2.groupby(['villageid', 'branchid', 'treatment'], as_index = True)[['death_under5','count_month_u5', 'death_under1', 'count_month_u1','death_under1m','count_month_u1m']].sum().reset_index() Generating variable of interest: data_2['count_month_u5'] = data_2.apply(lambda row: row.count_month_u5/12, axis = 1) data_2['mrate_u5'] = (data_2['death_under5']/data_2['count_month_u5'])*1000 Indexing: data_2 = data_2.set_index(['villageid', 'branchid'], drop = False) Model: model = PanelOLS(data_2.mrate_u5, data_2.treatment, entity_effects = True, drop_absorbed=True) res = model.fit(cov_type = 'clustered', cluster_entity = True) print(res)

When fitting the model Python returns an error:

ZeroDivisionError Traceback (most recent call last)

in 1 model = PanelOLS(data_2.mrate_u5, data_2.treatment, entity_effects = True, drop_absorbed=True) ----> 2 res = model.fit(cov_type = 'clustered', cluster_entity = True) 3 #print(res) 4 res ~\anaconda3\lib\site-packages\linearmodels\panel\model.py in fit(self, use_lsdv, use_lsmr, low_memory, cov_type, debiased, auto_df, count_effects, **cov_config) 1722 mu = 0 1723 total_ss = float((y - mu).T @ (y - mu)) -> 1724 r2 = 1 - resid_ss / total_ss 1725 1726 root_w = np.sqrt(self.weights.values2d) ZeroDivisionError: float division by zero At the same time, without fitting, i.e.: `model = PanelOLS(data_2.mrate_u5, data_2.treatment, entity_effects = True, drop_absorbed=True) print(res)` everything works properly, and produces a valid regression output. Equivalently, in R: `library(plm) model <- plm(mrate_u5~ treatment, data = df, index = c("branchid"), model = 'within') summary(model)` I'm okay without fitting, but I'm simply curious what I've done wrong this time. I've used _PanelOLS_ for other regressions with fitting and it worked nicely. Thanks! Thanks! Edit: for authorship reasons, leaving a link where I got all the data, it was provided by authors of the paper I'm replicatin https://www.openicpsr.org/openicpsr/project/116355/version/V1/view Edit2: after restarting the kernel, even without fitting doesn't work
bashtage commented 4 years ago

Did you figure it out?

bashtage commented 4 years ago

In your model, all of the "x" variables are absorbed by your entity effects. The model cannot be estimated.