QuickPay-Operational-Performance / Data-and-code

Data and code for econometric analysis
0 stars 0 forks source link

Check the parallel trends assumption in the current sample #19

Closed vob2 closed 4 years ago

vob2 commented 4 years ago

Based on the Vibhuti's comment in http://github.com/QuickPay-Operational-Performance/Data-and-code/wiki/How-can-readers-challenge-our-main-result%3F-(based-on-the-meeting-2020-04-24)

Assigning to Jie and Vibhuti. Please divide up as needed.

vibhuti6 commented 4 years ago

I posted the analysis for quarterly delays here. The parallel trends assumption seems to hold in the pre-treatment period, and the results so far seem to be consistent with the moral hazard hypothesis.

JNing0 commented 4 years ago

Thank you, Vibhuti, for this nice analysis! I have a question: since we have multiple periods and can estimate the trend within the small firms, why do we need large firms and the parallel assumption? Can you take out non-small firms and add "time" as a continuous co-variate in the regression? The interaction term of interest would be time x treatment. Does this make sense or am I missing something?

vibhuti6 commented 4 years ago

Hi Jie, I think we need large firms and parallel trends so we can indeed attribute the change in performance to the implementation of quickpay. If we just look at the change in performance of small business contracts over time, it may be driven by other factors.

For example, suppose there is a change in labor laws applicable to small businesses over time. Let's say the minimum wage for small businesses increased in April 2011. This may have led to layoffs and labor shortage after April 2011, and increased delays for contracts being handled by small businesses. This effect, then, would be completely independent of Quickpay reform. So we need both large and small businesses (and parallel trends) to estimate the treatment effect of Quickpay.

Does that make sense? Please let me know if you have any comments or suggestions.

JNing0 commented 4 years ago

In the example you gave, the parallel trend assumption won't hold any ways and we cannot use the trend of large business to estimate the trend of the small business. In fact, your example argues for taking out the large businesses. If something like that happened, we should be able to capture it in the time trend within the small business group. Unless that happened at the same time of the quickpay, in which case there is no way of teasing it out. The interaction term between time and treatment gives us what we need.

Of course, this approach has its limitations: we need to actually estimate the time trend, which could be nonlinear or of some weird pattern. The nice thing about using a control group is that we don't need to estimate the time trend, IF the parallel trend assumption holds. But the control and treatment groups are so different in our analysis so we really need to work hard to justify it.

That's why I suggest that, at least for a robustness check, let's see what happens if we take out the large businesses and actually estimate the time trend. We may need finer time scale than quarterly, maybe bi-monthly or monthly.

We may do the same thing with the large businesses as well, see what happens when they receive quickpay. Because the current result applies only to small businesses.

vibhuti6 commented 4 years ago

Thanks Jie, this was helpful. I am, of course, more than happy to try alternative specifications. But it is not entirely clear to me how we can treat "time" as a continuous co-variate.

My understanding is that each time period would correspond to a week/month/quarter depending on how we resample the data. In this case, to control for time, we will have to include fixed effects for all but one period. Since we are focusing only on small businesses, there is no control group. So the treatment variable would simply be "Post =1" if time period is after quickpay, and 0 otherwise. And the regression equation will be: Delay = a + bTime + cPost + d*(Post x Time) + e

But in this case, we would run into problems with multicollinearity. Because the variable Post is directly determined by the variable Time. I think I might be misunderstanding something here, could you please help clarify?

vob2 commented 4 years ago

I am not sure if this what Jie had in mind, but time fixed effects would be co-linear with Post. Therefore, for this regression we drop cPost term. Here is a summary: https://medium.com/eatpredlove/regression-difference-in-differences-208c2e787fd2

JNing0 commented 4 years ago

Let me explain more clearly.

Let y{it} be the response variable, namely, the delay of a project i measured in time period t, e.g., month t. Let's consider the simple case with only two covariates: x{it} and I_{it}, where

We will run the following regression: y{it} = b0 + b1 * x{it} + b2 I_{it} + b3 (x{it}*I{it}) + e_{it}

As you can see, this is not a DiD model. It is a simple regression that estimates the time trend within all small businesses, so that we don't need to assume that the untreated group has the same trend. The coefficient of interest is b3, which captures the change in the slope caused by treatment.

For this to work, we need to have enough time periods to get information along the time dimension.

You are right that x{it} and I{it} have high linear correlation, possibly around 0.7~0.8. But as long as quickpay does have an effect, we should have a statistically significant estimate for b3 if we have enough data, which I think we do.

A caveat about this approach is that the time trend could be nonlinear or of a weird pattern. All we can do is estimating/approximating it using some polynomial form of time, which introduces errors to the estimate of the treatment effect. But I don't think it would affect the qualitative result. As a starting point, let's just use a linear trend.

The nice thing about DiD is that we don't need to approximate the time trend at all so the estimate of the treatment effect would be more accurate, IF the parallel trend assumption holds. That's a big "if" in our context.

If you treat time periods as a fixed effect, you would still need the untreated group in the regression. This is because when you move from before treatment to after treatment, the treatment effect is always accompanied with the time period going from t to t+1. So we cannot separate the treatment effect from the time trend unless we have an untreated group and make the parallel trends assumption.

vibhuti6 commented 4 years ago

Thanks, Jie, for the detailed explanation. So if I understand correctly, for each project, i, we add a covariate x{i,t} = t. For example, if the project is in its first month, we will have x{i,t}=1. If it is in its second month, we will have x_{i,t}=2. Is that right?

JNing0 commented 4 years ago

The value of t should be the observation time/calendar time, not the intrinsic time of each project. For example if the data starts in Jan. 2008, then project i's delay in that month is associated with x{it} = 1. In Feb. 2008, x{it} = 2. etc. The reason is that we want to capture the trend over calendar time so that we can tease it out to get the treatment effect.

Jie Ning //////////////////////// Assistant Professor Department of Operations Weatherhead School of Management Case Western Reserve University Cleveland, OH 44106 e-mail: jie.ning@case.edu tel: 216-368-3841 ////////////////////////

On Sat, Apr 25, 2020 at 10:57 PM vibhuti6 notifications@github.com wrote:

Thanks, Jie, for the detailed explanation. So if I understand correctly, for each project, i, we add a covariate x{i,t} = t. For example, if the project is in its first month, we will have x{i,t}=1. If it is in its second month, we will have x_{i,t}=2. Is that right?

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/QuickPay-Operational-Performance/Data-and-code/issues/19#issuecomment-619472385, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOBDD5XG65PLHZHGJJ7SOOTROOPJXANCNFSM4MQE2IHQ .

vibhuti6 commented 4 years ago

Got it, thanks! To follow up on that, what about the same month in different years? For example, do we assign the value 2 for both Feb 2010 and Feb 2011?

JNing0 commented 4 years ago

It would continue. The time always goes forward, right? =] Essentially, t is the t-th month in the observation horizon.

vibhuti6 commented 4 years ago

I see, thanks for clarifying! :) I will try to run this specification and get back to you!

vibhuti6 commented 4 years ago

Hi Jie, I have added the results from this specification here -- please see the last regression table or click on Section 5 in the index.

JNing0 commented 4 years ago

Thanks, Vibhuti! This is what I expected. I think the result about project delay is robust. We should probably take out the after-quickpay first-order term in the regression. It is coupled with the time effect. Similar to the DiD webpage Vlad pointed to.

Just to clarify, in your data the end of the observation horizon is June 30, 2012, not Sept. 30, 2012, correct?

There is something interesting in your "Parallel Trends" plot here. Before Sept 30, 2012, the average delay has a seasonal pattern with cycle 4. The third quarter (ends on 9/30) always has the longest average delay compared to the 1st, 2nd, and 4th of a year. However, the third quarter in 2012 is completely different. It has the shortest delay, shorter than the first two quarters in 2012.

We know that quickpay is launched for both small and large businesses in July, 2012. Don't know whether that's the reason. But we might want to look into it as it goes against our story... It also might imply that the quickpay received by small businesses might be different before and after the full-scale launch? Before the full-scale launch, the small businesses has the "lite version" of quickpay?

vibhuti6 commented 4 years ago

Hi Jie, I will remove the "after-quickpay" term in that regression.

Yes, the end of observation horizon is July 1. So essentially the Jun 30, 2012 - Sept 30,2012 does not include the entire quarter (because the data is truncated at July 1). In other words, on the graph, Sept 30 is actually July 1.

I had resampled the data on a quarterly basis in Python. So the modifications made on July 1 are treated as the last observation in the third quarter of 2012. I should probably fix the last x-axis label on the graphs to accurately represent this.

But yes, I think going forward, we should first see what the average delays look like in the quarters after June 30, 2012 -- when large businesses also started receiving accelerated payments.

vibhuti6 commented 4 years ago

Hi Jie, I have now truncated the data at June 30, 2012 instead of July 1, 2012; and fixed the corresponding regressions and plots. I have also added the regression results without the covariate "after-quickpay" on the wiki page.

vob2 commented 4 years ago

It looks like we have exhausted this issue for now.