bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
https://bashtage.github.io/linearmodels/
University of Illinois/NCSA Open Source License
930 stars 184 forks source link

wald_test #523

Open SPDA36 opened 1 year ago

SPDA36 commented 1 year ago

Sorry, I did not know where to place this issue of mine. I have not found a solution online either.

The wald_test does not allow for brackets [ ] or semicolon :, which seems to be an issue when you use the formula operators like C( ) and interacting terms. Below is a sample of the code.

fm = 'lwage ~ C(year)*educ + union + EntityEffects' model2 = plm.PooledOLS.from_formula(formula=fm,data=df).fit(cov_type='clustered',debiased=True, cluster_entity=True)

null = "C(year)[T.1987]:educ = 0, C(year)[T.1986]:educ = 0" model2.wald_test(formula=null)

FormulaSyntaxError: Unknown operator '['. C(year)⧛[⧚T.1987]:educ

I have tried replacing C(year, Treatment(1987)):educ, C(year, Treatment(1987))*educ with no joy. I have tested the similar syntax in statsmodels and it works fine with [ ] and :.

bashtage commented 1 year ago

Hi,

Can you provide me with a copy-pastable example that raises the issue?

Thanks, Kevin

bashtage commented 1 year ago

import linearmodels as plm
from linearmodels.datasets import wage_panel

df = wage_panel.load()
df = df.set_index(["nr","year"])
idx = df.index
df = df.reset_index()
df.index=idx
fm = 'lwage ~ C(year)*educ + union + EntityEffects'
model2 = plm.PooledOLS.from_formula(formula=fm,data=df).fit(cov_type='clustered',debiased=True, cluster_entity=True)
SPDA36 commented 1 year ago

Kevin, Below is the copy-paste code.

Python implementation: CPython Python version : 3.10.9 IPython version : 8.10.0

Compiler : MSC v.1916 64 bit (AMD64) OS : Windows Release : 10 Machine : AMD64 Processor : Intel64 Family 6 Model 154 Stepping 3, GenuineIntel CPU cores : 20 Architecture: 64bit linearmodels: 4.31

import linearmodels as plm from linearmodels.datasets import wage_panel

df = wage_panel.load() df = df.set_index(["nr","year"]) idx = df.index df = df.reset_index() df.index=idx fm = 'lwage ~ C(year)*educ + union + EntityEffects' model2 = plm.PooledOLS.from_formula(formula=fm,data=df).fit(cov_type='clustered',debiased=True, cluster_entity=True)

null = 'C(year)[T.1984]:educ = 0' model2.wald_test(formula=null)

FormulaSyntaxError: Unknown operator '['. C(year)⧛[⧚T.1987]:educ

null = 'C(year,Treatment(1984)):educ = 0' model2.wald_test(formula=null)

FormulaSyntaxError: Unknown operator ':'. C(year,Treatment(1984))⧛:⧚educ = 0

bashtage commented 11 months ago

I think you need to escale terms that are not valid Python, e.g.,

null = '`C(year,Treatment(1984)):educ` = 0'

Note the ticks around the entire variable name. I'll need to check that this works, but it should.

Note: There was a recently fixed regression in formulaic that might affect escaping. This should be fixed in the next release of formulaic.