Look ahead bias and proper usage of cfacshr

chenandrewy commented 1 month ago

Related to https://github.com/OpenSourceAP/CrossSection/issues/95, it seems there is look ahead bias in cfacshr. This is seen in the fact that cfacshr is almost always 1.0 at the end of the sample. This issue can affect ShareIss1Y, ShareIss5Y, and anything that uses 13F data.

It also seems like we're not using cfacshr correctly in ShareIss1Y and ShareIss5Y. The proper usage can be seen in this example of Nvidia's 2021 4:1 split

So the adjusted shares should multiply by cfacshr. But in our code, we divide ☹️. Thankfully splits are relatively rare but we should fix this.

The OP for ShareIss1Yr, Pontiff and Woodgate 2008 uses facshr, not cfacshr:

Though, I'm not sure exactly how to map their $f_i$ into CRSP's facshr (my sense is that there is a typo, and $TotalFactor_t$ should multiply $SharesOutstanding_t$. Regardless, it's clear from the Nvidia example how to use cfacshr properly and that one should not divide.

ShareIss5Y comes from Daniel and Titman 2006. They mention they adjust for splits, and that they use facshr (not cfacshr): So we should also adjust for splits as best as we can here and avoid the lookahead bias from cfacshr.

For the 13F signals, it's much harder for me to understand. We may need to put that off for the next annual update.

chenandrewy commented 1 month ago

Long story short, we should update ShareIss1Y and ShareIss5Y to

Multiply shrout and cfacshr, not divide
Lag properly, following Pontiff and Woodgate 2008 Table III (DT-ISSUE is ShareIss5Y)

It turns out that, because both of these signals normalize by split-adjusted shares outstanding (gen ShareIss1Y = (l6.temp - l18.temp)/l18.temp), the signal lacks the look-ahead bias from cfacshr. And since the code is so simple, it's not worth it to try to do an alternative calculation.

However, we implemented the formulas wrong, and also didn't lag quite the right way. The ShareIss5Y lag is also described in Daniel and Titman 2006 Section A.1. There returns are July of year t to June of year t+1 and book values come from year t-1, so there should be roughly a 6-month lag. In our notation, time_avail_m is one month lagged relative to returns, so I think we should be using l5.temp.

tomz23 commented 1 month ago

Thanks for looking into this! FWIW, I replicated Pontiff and Woodgate (2008) following their instructions to compute TotalFactor from facshr and got very similar results to just using cfacshr. (Btw, based on how they construct TotalFactor it makes sense to divide rather than multiply shrout. Below is a snippet for the Nvidia example)

grafik

tomz23 commented 1 month ago

I updated ShareIss1Y here and also included alternative code that is closer to OP as a comment: 4483276b7f121ec7773f33932361247726ef62b5

tomz23 commented 1 month ago

ShareIss5Y.do fix is here: 4295ff99e58ff8a6ceb7acf2718c42cdab77844f

chenandrewy commented 1 month ago

I'm closing this issue because only the 13F predictors remain to be checked.

OpenSourceAP / CrossSection

Look ahead bias and proper usage of cfacshr #152