bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
https://bashtage.github.io/linearmodels/
University of Illinois/NCSA Open Source License
943 stars 184 forks source link

Interaction terms for Categorical Variables in a RE Model #566

Open Arceus opened 1 year ago

Arceus commented 1 year ago

I'm trying to assess the effects on "returnAvg" of the interactions between my "SupplyClass" and "DupeClass" categorical variables, but I'm not sure I'm doing it right given that I always get the following error:

Traceback (most recent call last):
  File "F:\Python Projects\Tesi\Random Effects Regression.py", line 51, in <module>
    model = RandomEffects.from_formula("returnAvg ~ 1 + type + Color + Items + StoreRange + HasGlow + HasCutout + SupplyClass*DupeClass", data=data)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 2670, in from_formula
    mod = cls(dependent, exog, weights=weights, check_rank=check_rank)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 2616, in __init__
    super().__init__(dependent, exog, weights=weights, check_rank=check_rank)
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 328, in __init__
    self._validate_data()
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 479, in _validate_data
    rank_of_x = self._check_exog_rank()
  File "F:\Python Projects\Tesi\venv\lib\site-packages\linearmodels\panel\model.py", line 434, in _check_exog_rank
    raise ValueError(
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.
ValueError: exog does not have full column rank. If you wish to proceed with model estimation irrespective of the numerical accuracy of coefficient estimates, you can set check_rank=False.

Turning check_rank to True leads to Singular Matrix error.

I've tried looking into the documentation but I haven't found any answer on how to do it properly. Here's how I tried inserting the interaction terms: model = RandomEffects.from_formula("returnAvg ~ 1 + type + Color + Items + StoreRange + SupplyClass+ DupeClass + SupplyClass*DupeClass ", data=data)

bashtage commented 1 year ago

Are these both pandas categorigcals?

bashtage commented 1 year ago

Can you provide some more information on the structure of the data you are modeling?

Arceus commented 1 year ago

I'm pulling the data from my .csv longform unbalanced panel database. When I say categorical, e.g. SupplyClass or DupeClass, I refer to an array of strings classifying each id from column "itemName" differently. I don't run into any problem using these as separated explanatory variables without creating dummies beforehand, as linearmodels recognizes them as categoricals and handles them automatically.

bashtage commented 1 year ago

What size is the array? What are the entity and time indices?