OpenSourceAP / CrossSection

Code to accompany our paper Chen and Zimmermann (2020), "Open source cross-sectional asset pricing"
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3604626
GNU General Public License v2.0
716 stars 215 forks source link

Lookahead bias in `AnnouncementReturn` #158

Open chenandrewy opened 23 hours ago

chenandrewy commented 23 hours ago

Many thanks to Ming Zheng of University of Gothenburg for catching this bug. He writes:

I am writing regarding a potential error in your code for calculating returns from sorting on firm-level announcement returns. While the code for the signal is right, it does use information in the following month (if the announcement date is at the month end). As a result, doing monthly portfolio sorts based on this signal will lead to very high returns. (we have searched over your GitHub page and we can only find a do file: "ZZ2_AnnouncementReturn.do", so not sure that is the right one you use to generate the announcement return factor)

We have this doubt because our recent paper "Analysts Are Good At Ranking Stocks" has compared our analyst-based strategy with all factors in your factor zoo. The long-short portfolio by sorting on announcement returns gives a super high return and Sharpe ratio that we cannot beat, and it is probably the best-performing factor in the zoo. So we just have some doubt in it. Thanks!

There is indeed a nontrivial lookahead bias in ZZ2_AnnouncementReturn.do, unfortunately. The error is in these lines: image

Line 49 means that we use data 2 days after the announcement date. Tom finds

If I replace line 55 with gcollapse (sum) AnnouncementReturn (firstnm) AnnTime (min) mintime = time_d (max) maxtime = time_d, by(permno time_ann_d) and then br if mofd(maxtime) > mofd(mintime) I get 120k out of 950k obs that cross the month FWIW, I think the fix is (probably) to increase time_avail_m by one month in line 56 if maxtime > mintime (or just use gen time_avail_m = mofd(maxtime) in line 56 directly).

Given that this is a clear and lookahead bias that occurs for about 10% of observations I think we should patch this soon and create a new data release.