Look-ahead bias in cfacpr

chenandrewy commented 2 years ago

@pgeertsema and @helenhelu write:

Naive use of CRSP variable cfacpr -- as in "adj_price = abs(prc)/cfacpr" -- leads to non-trivial look-ahead bias.

The problem with this approach is that the CFACPR variable in CRSP is constructed in such a way that the raw split-adjusted price obtained by using it is always equal to the price in the most recent period. In other words, the CFACPR variable is always 1 in the last period in the CRSP database. This necessitates the recalculation of the CFACPR variable for all historical periods every time a new vintage of CRSP data is assembled. This recalculation depends on the future occurrence of stock splits – hence the look-ahead bias.

This issue affects 6 predictors: high52, trendfactor, DelBreadth, IO_ShortInterest, Activism1, Activism2. The latter 4 are 13f predictors that build on WRDS's replication code.

Before 2000, there were roughly 50 stock splits per year and recently stock splits are less frequent. This could have effects on the 13f Actvism and IO_Shortinterest predictors which use double sorts and a small number of stocks.

chenandrewy commented 2 months ago

I thought I'd add some info from an earlier draft of @pgeertsema + @helenhelu's paper (https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4013958) which explains how this is a problem nicely:

chenandrewy commented 2 months ago

Regarding High52: George and Hwang 2004 does not actually mention doing any split adjustments. The "Data and Method" section simply says

and the motivation does not mention charting or anything that would imply split adjustments. The signal motivation (page 2) is

So, High52.do should just have

gen prc = abs(prc)

and prcadj should be replaced with prc.

chenandrewy commented 2 months ago

Regarding TrendFactor: Han, Zhou and Zhu 2016 does say "prices are adjusted for splits and dividends when necessary." But it turns out our TrendFactor.do does not have lookahead bias, because of the way Han et al. normalize the signal.

The signal is $$\tilde{A} = \frac{L^{-1}\sum{j=0}^{L-1}P{d-j}^{adj}}{Pd^{adj}}$$ where $P{d-j}^{adj}$ is the closing price, split-adjusted, on day $d-j$.

The proper way to split-adjust is to declare a reference day $d$ (This is the day you want the adjusted and non-adjusted prices to be equal) and then define $$P{d-j}^{adj} \equiv P{d-j}/cfacprc{d-j} *cfacprc{d}$$ Where $P{d-j}$ is the unadjusted price (just abs(prc) in CRSP) and $cfacprc{d-j}$ is the cfacprc variable in CRSP. To check this formula, you can look at Apple's (permno 14593) split in 2014 or Nvidia's (permno 86580) split in 2021.

So as you can see, $cfacprc_{d}$ cancels out in the numerator and denominator. So TrendFactor.do is fine.

In my view, the code is transparent enough that we don't need to adjust it to "ensure" there is no lookahead bias. But we should add a comment on how the proper calculation would work.

tomz23 commented 2 months ago

High.52.do fixed here 9e85ba0f11aa46fadaa51bc307f0bf5b0fd2d405

tomz23 commented 2 months ago

Comment in TrendFactor.do here be73f4acf59f4e9ac515b8ca54d3abc7fb0157b0

chenandrewy commented 2 months ago

I'm closing this issue because only the 13F predictors remain to be checked.

OpenSourceAP / CrossSection

Look-ahead bias in cfacpr #95