Discrepancy between graph and tau estimate

Hi, thanks so much for this package! I'm really grateful to you for coding this up and building it. For one of my applications, however, I'm finding some apparent discrepancies between the graphed quantities and the reported $\tau$. For example, see the below screenshot:

This $\tau$ seems obviously much larger than the visible difference between the graphed lines, which inverts from roughly 0.1 above control to 0.2 below control, suggesting a $\tau$ closer to -0.3.

Can you help me understand what might be causing this discrepancy? I was trying to debug by inspecting the fixed effects, but I wasn't able to see in the code where the fixed effects are stored, so had trouble understanding what might have gone amiss; I'm not sure if the plot is wrong, if $\tau$ is wrong, or if there's just something odd about the relatively sparse $\lambda$ weighting that is leading to the mismatch between the estimate and the visual pattern. I'd be happy to provide replication data if that would be helpful, thanks regardless for your attention to this.

Would it be correct to say that the current code takes a weighted difference between pre- and post-periods, but without estimating a full two-way fixed effects regression? From what I can tell, the estimation of the weights $\omega$, $\lambda$ and $\zeta$ look great, but I believe that the computation of $\tau$ as the difference between the weighted averages does not implement the two-way fixed effects (diff-in-diff) step as described in the paper.

I've adjusted my local version of the module with the following functions for estimating $\tau$:

  def regression_df(self):

    date_var = self.df.index.name
    melted_df = self.df.reset_index().melt(id_vars=date_var)
    id_var = melted_df.columns[1]
    melted_df['treatment'] = melted_df[id_var].isin(self.treatment).values * melted_df[date_var].isin(self.Y_post_t.index).values

    # sdid
    omega_weights, lambda_weights = self.estimated_params()
    melted_df = melted_df.merge(omega_weights.iloc[:-1].rename(columns={'features':id_var,
                                                                      'sdid_weight':'omega_weight'}),on=id_var,how='outer')
    melted_df = melted_df.merge(lambda_weights.rename(columns={'time':date_var,
                                                                      'sdid_weight':'lambda_weight'}),on=date_var,how='outer')

    melted_df['sdid_weight'] = melted_df['omega_weight'].fillna(1/len(self.treatment)) * melted_df['lambda_weight'].fillna(1/len(self.Y_post_t))

    # sc
    omega_weights_ADH = self.estimated_params('sc')
    melted_df = melted_df.merge(omega_weights_ADH.rename(columns={'features':id_var}),on=id_var,how='outer')
    melted_df['sc_weight'] = melted_df['sc_weight'].fillna(1/len(self.treatment))

    melted_df = melted_df.set_index([id_var,date_var])

    return melted_df

...

  def hat_tau(self, model="sdid"):

    """
    # adjusted from github to perform weighted TWFE regression 
    """

    regression_df = self.regression_df()

    if model == "sdid":

        regression_df_noZeroWeight = regression_df.loc[regression_df['sdid_weight']>0] # weights must be strictly positive for PanelOLS; as they're zero weight, OK to drop

        FE = PanelOLS(regression_df_noZeroWeight['value'], regression_df_noZeroWeight['treatment'],
                  entity_effects = True, 
                  time_effects = True,
                  weights = regression_df_noZeroWeight['sdid_weight'])

        result = FE.fit()

        tau_est = result.params[0]

    elif model == "sc":

        regression_df_noZeroWeight = regression_df.loc[regression_df['sc_weight']>0] # weights must be strictly positive for PanelOLS; as they're zero weight, OK to drop

        FE = PanelOLS(regression_df_noZeroWeight['value'], regression_df_noZeroWeight['treatment'],
                  time_effects = True,
                  weights = regression_df_noZeroWeight['sc_weight'])

        result = FE.fit()

        tau_est = result.params[0]

I've also made a few smaller adjustments to plot this object differently, and to refactor the variance code to use these functions directly by making a deepcopy() of the given class instance and then reestimating the attribute objects, but the main changes were the above adjustments. Would be interested to hear if you think this is correct. Thanks again for building this code!

Here's the new version of the above graph with this new estimation method, which looks closer to expected:

MasaAsami / pysynthdid

Discrepancy between graph and tau estimate #3