bashtage / linearmodels

Additional linear models including instrumental variable and panel data models that are missing from statsmodels.
https://bashtage.github.io/linearmodels/
University of Illinois/NCSA Open Source License
929 stars 184 forks source link

Save *both* estimated effects #392

Open effedp opened 2 years ago

effedp commented 2 years ago

Hey there,

I run the textbook example of PanelOLS with both "entity" and "time" FEs. I am trying to save the two estimated effects using linearmodels.panel.results.PanelEffectsResults.estimated_effects, but I obtain only one column in the output DataFrame, which does not even seem to resemble either of the two FEs (I am double checking everything with Stata's reghdfe).

from linearmodels.datasets import wage_panel
from linearmodels import PanelOLS

data = wage_panel.load()
year = pd.Categorical(data.year)
data = data.set_index(["nr", "year"])

exog_vars = ['expersq']
exog = sm.add_constant(data[exog_vars])

mod = PanelOLS(data.lwage, exog, entity_effects=True, time_effects=True)
result = mod.fit(cov_type='clustered')

result.estimated_effects

gives me the following output

nr      year        estimated_effects
13      1980        -0.993351
        1981        -0.835877
        1982        -0.728156
        1983        -0.620804
        1984        -0.479181
...     ...     ...
12548           1983        -0.212544
        1984        -0.070921
        1985        0.059622
        1986        0.212194
        1987        0.382053

4360 rows × 1 columns

How can I save both the estimated effects? Am I missing something, or is this a bug?

Thank you for your help!

bashtage commented 2 years ago

estimated_effects are the total combined effects included in the model. The method used to remove FE does not directly lead to a separate estimate of the effects. In balanced panels it should be easy to get them by using the estimated effects of the LHS variable in including only entity effects. The estimated effects from this auxiliary model will be the entity effects, and the residuals will be the time effects.

effedp commented 2 years ago

Hello Kevin and thanks for your reply. I am not sure it is that easy. If I got it correctly, your solution would imply that estimating a regression of a dependent variable on individual effects and then taking the residual as time effect is the same as a regression of the dependent variable on both time and individual effects, which is not the case.

Do you think we will see an option for a separate estimate of the effects soon? It would be handy for many applications, and there is no way to do it in Python to the best of my knowledge.

alistaircameron commented 10 months ago

Hello, as stated in his reply, Bashtage's method works for balanced panels, but not for unbalanced panels. Like the OP, I needed to extract both individual and time FEs in an unbalanced panel, here’s my quick and dirty solution, hope it helps others who stumble across this.

# Fit some model with PanelOLS
mod = PanelOLS(
    dependent = df['y'], 
    exog = exog, 
    entity_effects = True,
    time_effects = True
    )

twfe = mod.fit()

# Get the estimated effects.
ees = twfe.estimated_effects.__deepcopy__(False)
ees.reset_index(inplace = True)
ees.columns = ['individuals', 'time', 'estimated_effects']
ees = ees.drop_duplicates(subset = ['individuals', 'time'])
ees.reset_index(inplace = True, drop = True)

# Make a list of all possible years, a place to store year fixed effects, and a running sum.
time = np.sort(ees.time.unique())
time_fe, period, running_sum = [], [], 0

for t in range(len(time) - 1):
    # Find an individual with data recorded in the base year, b, AND in year b + 1. Stop.
    b, c = time[t], time[t+1]
    individuals_in_base_period = list(ees[(ees.time == b)].individuals)
    individuals_in_following_period = list(ees[(ees.time == c)].individuals)

    for i, j in enumerate(individuals_in_base_period):
        if j in individuals_in_following_period:
            ind = j 
            break
        else:
            if i == len(individuals_in_base_period):
                print("Try another method. Sorry.")

    # Calculate year b+1 fixed effect WITHIN the individual.
    year_year_diff = ees[(ees.individuals == j) & (ees.time == c)].estimated_effects.iloc[0] - ees[(ees.individuals == j) & (ees.time == b)].estimated_effects.iloc[0]
    time_fe.append(running_sum + year_year_diff)
    period.append(time[t+1])
    running_sum += year_year_diff

# Merge with the original df to get the individual fixed effects.
df_time = pd.DataFrame(period, time_fe)
df_time.reset_index(inplace = True)
df_time.columns = ['time_fe', 'time']

ees = ees.merge(df_time, how = "left", on = "time")
ees['individual_fe'] = ees['estimated_effects'] - ees['time_fe']

# And you've got time + individual FEs.