Full time horizon analysis

JNing0 commented 4 years ago

Regression model

Dependent variable: Y_it = Delay of contract i observed in time t Co-variates:

Time (continuous): t = 1, 2, ...
Quarter fixed effects (categorical): Q_q(t) = Indicator(time t is the q-th quarter in a year), q=1,2,3,4
Year fixed effects (categorical): X_k(t) = Indicator(time t is the k-th year in the observation horizon), k=1, 2, 3, ...
Treatment: Tr_it = 1 if contract i if affected by quickpay in time t
Small business: SM_i = 1 if contract i is awarded to a small business
Other firm, industry, and PSC code fixed-effect controls

Co-variates t, Q_q(t), and X_k(t) give us a nonlinear trend with seasonality. From the data, seasonality is quite clear, so let's always keep t and Q_q(t) in the time trend part. There are a few model configurations about the time trend that we can consider:

t + Q_1 + ... + Q_4 + Tr x t + Tr x Q_1 + ... + Tr x Q_4 + other controls: Allows different treatment effect on linear trend and seasonal and assumes same treatment effect on large and small businesses
t + Q_1 + ... + Q_4 + Tr x t + Tr x Q_1 + ... + Tr x Q_4 + SM x Tr x t + SM x Tr x Q_1 + ... + SM x Tr x Q_4 + other controls: Allows different treatment effect on linear trend and seasonal and different treatment effects on large and small businesses
Add year fixed effects to the time trend, etc ...

Diff-in-Diff

Consider subsamples so that we always have a control group in the data. There are four such subsamples:
<4/27/2011 to <7/11/2012: treatment = quickpay, control = large business (no quickpay)
<7/11/2012 to <2/21/2013: treatment = no quickpay, control = small business (quickpay)
<2/21/2013 to <8/1/2014: treatment = quickpay, control = large business (no quickpay)
<8/1/2014 to 6/30/2017: treatment = no quickpay, control = small business (quickpay)

The parallel trend assumption is somewhat awkward, as once we discover treatment effect in sample 1, we kinda disprove the parallel trend assumption in later periods.

Anyhow, if we ignore that for now and do DiD in four subsamples, then here is the interpretation of the treatment effect in the four subsamples:

Subsamples 1 & 3: Treatment effect = effect of quickpay on small business when large businesses do not receive quickpay
Subsamples 2 & 4. Because the small businesses are the control now, the large businesses go from treated to untreated. Treatment effect = effect of no quickpay on large business when small business receive quickpay

JNing0 commented 4 years ago

Just updated the issue. Please review and comment.

vibhuti6 commented 4 years ago

Thanks Jie, I will go through this and get back to you.

vob2 commented 4 years ago

Let me propose a model. Why would this not work? Please tell me why this is a bad model

Our sample is indexed as contract-quarter: ct contracts count from c = 1 to c = N and quanters from t = 1 to t = T

Projected_Delay_ct = \delta Expedited Pay_ct + \Sum_{q=2}^T \tau_q Quarter_qt + \Sum_{k=2}^{N} \gamma_k Contract_kc + e_ct

In this equation, Projected_Delay_ct is the delay for contract c projected at time t Expedited Pay_ct is the indicator that contract c is subject to expedited payment at time t Quarter_qt is the indicator that time t equals quarter q Contract_kc is the indicator that contract c equals contract k

vob2 commented 4 years ago

To add to my comment: is the issue with the lack of control group across the entire horizon, as Jie discusses here: https://github.com/QuickPay-Operational-Performance/Data-and-code/issues/25#issuecomment-627115897?

Then, would running two DiD over two time subsamples help? Again, as Jie discusses above with regarding DiD?

vibhuti6 commented 4 years ago

Hi Jie and Vlad, yes I think the key issue is that we don't have any group that is always untreated in the full sample.

I tried to explain this graphically as follows. Suppose we consider the time period from Oct 1, 2009 to Feb 21, 2013. Then, in the figures below, the treatment effect for small businesses (TE1) is fine because large businesses are untreated at that point. But TE2 will be underestimating the effect of the treatment.

Screen Shot 2020-05-12 at 4 55 33 PM

vibhuti6 commented 4 years ago

Subsamples 1 & 3: Treatment effect = effect of quickpay on small business when large businesses do not receive quickpay

Hi @JNing0 , a quick question about this point. In subsample 3, small businesses are treated at all times and large businesses are always untreated. So I am not sure if we can run a DiD for this time horizon.

JNing0 commented 4 years ago

Hi Vlad, yes, the problem with the model is that we don't have a control group. In the model, the time fixed effect \tau_q in the quarter where all contracts receive quickpay will include the treatment effect. So the estimate for \delta will be off. I hope that we can avoid it by not using the fixed time effects for each period and estimating the treatment via the interaction term between time and treatment.

The regression models I proposed is just a starting point, we will probably need better models than that, to take advantage of the long observation horizon.

DiD models on subsamples will give us a control group, but I am a bit concerned about the parallel trend assumption. Once we show there is effect in the first period, wouldn't that invalidate the assumption?

JNing0 commented 4 years ago

Vibhuti, sorry I made a mistake about subsample 3. Big businesses are kicked out of quickpay on 2/21/2013 and are included on 8/1/2014. So the time in between, before 2/21/2013 (and after 7/11/2012) till before 8/1/2014, the control group is small businesses that always receive quickpay. The treatment is no quickpay. The big businesses go from not treated (receiving quickpay) before 2/21/2013 to treated (no quickpay) after 2/21/2013.

I rewrote the four subsamples below:

<4/27/2011 to <7/11/2012: treatment = quickpay, control = large business (no quickpay)
<7/11/2012 to <2/21/2013: treatment = no quickpay, control = small business (quickpay)
<2/21/2013 to <8/1/2014: treatment = no quickpay, control = small business (quickpay)
<8/1/2014 to 6/30/2017: treatment = no quickpay, control = small business (quickpay)

The interpretation of the treatment effect in the four subsamples:

Subsample 1: Treatment effect = effect of quickpay on small business when large businesses do not receive quickpay
Subsamples 2 & 4: The large businesses go from treated to untreated. Treatment effect = effect of no quickpay on large business when small business receive quickpay
Subsample 3: The large businesses go from untreated to treated. Treatment effect = effect of no quickpay on large business when small business receive quickpay.

JNing0 commented 4 years ago

Some further amendments to the regression models:

Let's not include the year fixed effects in the model. There are years where all businesses receive quickpay.
We can add a t^2 term in the model to capture some nonlinearity in time trend in addition seasonality. In models that have the t^2 term, the treatment indicator will interact t^2 as well.

vob2 commented 4 years ago

Thank you for explaining the problem so clearly!

vibhuti6 commented 4 years ago

Hi Jie and Vlad, I think Jie's suggestions are a good place to start with the full sample analysis. I will get back to you on this when I have some updates on the results. Thanks.

And thanks, Jie, for clarifying this point!

3. Subsample 3: The large businesses go from untreated to treated. Treatment effect = effect of no quickpay on large business when small business receive quickpay.

vibhuti6 commented 4 years ago

Just floating another idea here: we could look at past observations as a control group. There can be two ways of doing this. I explain them below using large businesses as an example.

We can use large business contracts in 2009-2012 as a control group for large business contracts in 2014-2017. We can match contracts based on similar characteristics such as the same task, same firm, same subagency, and same industry code. The assumption here is that in the absence of quickpay, pattern of delays will be similar for the two groups.
Suppose a large business contract started on Dec 31, 2013 and ended on Sept 30, 2014. Then, we can predict the delays in the third quarter of 2014 (when payment was accelerated) using delays in previous quarters of the same contract. And use this predicted value as a control group for the realized value (after treatment). There is a stream of literature in economics on “unconfoundedness” that follows this approach under some assumptions. I need to look into it but here are some references:
- https://arxiv.org/pdf/1710.10251.pdf
- https://www.cambridge.org/core/books/causal-inference-for-statistics-social-and-biomedical-sciences/71126BE90C58F1A431FE9B2DD07938AB

vob2 commented 4 years ago

Thank you, Vibhuti.

Using large business contracts in the past as a control for large business contracts in the future is worth a try. There is one issue I see with this idea. If there are different economic conditions in 2009-2012 than in 2014-2017 they could be driving the differences (if there are differences) and not Quickpay. We would have to through every conceivable control for economy in the regression and even then we might miss things.

Need to think more about the second point.

vibhuti6 commented 4 years ago

Thanks Vlad, that's a fair point and I will think more about it.

vob2 commented 4 years ago

Jie,

Do we have the model formally written out for 2014 implementation of QuickPay? We have a couple of issues open, including this one with comments, but I cannot find a unified model. Do you know where it is? If we do not have it, can you write a draft of it?

Let's assume that we will present 2009 and 2014 implementations separately, but 2009 implementation will appear first in the paper.

Thanks,

Vlad

JNing0 commented 4 years ago

@vob2 Hi Vlad, we will use the same formulas as the 2011 QuickPay, only the coefficient interpretation is different. See the model and discussion here for more details. This file explains why we can interpret the coefficients that way.

vob2 commented 4 years ago

Thanks, Jie! We have multiple issues open on this. We need to consolidate.

Yes, the argument in the file convinced me. This is visual, what is the corresponding regression? The model makes sense for the first implementation. But is it the same model for the second implementation? Vibhuti used a different model to derive results. Instead of Post variable, there is Before variable. Reading that table, we are seeing effect of slow payment.

What version of the model should be use? Can we consolidate visual representation, regression, and results (existing or rederived if model changes) in one place so that there is no ambiguity or gaps?

JNing0 commented 4 years ago

Yes, I will consolidate the models and visual arguments for the 2014 QuickPay.

Just to give a quick answer, the only change in the model for the 2014 QuickPay is to change _Postt to _Pret. Everything else should be the same.

In the table mentioned above, the coefficient of the interaction term, Pre x Large business (i.e., "before_aug_2014:business_typeO" term) is positive, meaning that lacking of QuickPay leads to delay. So implementing QuickPay leads to finishing early. In other words, the effect of QuickPay is the negative of the coefficient of Pre x Large business, as stated in this file.

vob2 commented 4 years ago

I noticed you closed out other issues. Thank you!

In describing the model and results, let's focus on the business insights, which is: when payments to large businesses are delayed, large businesses delay project completion. Or, when payments to large businesses are expedited, large businesses expedite project completion.

JNing0 commented 4 years ago

I have created a wiki page that consolidates the model for the 2014 QuickPay. I am closing this issue since we agreed that we will present the two implementations one after another.

QuickPay-Operational-Performance / Data-and-code

Full time horizon analysis #24

Regression model

Diff-in-Diff