OpenSourceAP / CrossSectionDemos

Example code of simple things one can do with our open-source asset pricing data
GNU General Public License v3.0
42 stars 25 forks source link

Lagging Price, Size, STReversal by one month before merging in dl_signals_add_crsp.R #1

Closed sehoff closed 2 years ago

sehoff commented 2 years ago

Before merging to the other predictiors, I think, one needs to lag the three characteristics mentioned in the headline. Otherwise, one would either create one more lag in all other characteristics when merging to CRSP return data, or look-ahead when merging the output of "dl_signals_add_crsp.R" to CRSP returns.

My solution (even though I usually do not programm in R), would be the following, starting from line 110:

convert ret to pct, other formatting crspm2 = crspm2 %>% mutate( ret = 100*ret , date = as.Date(date) , me = abs(prc) * shrout , yyyymm = year(date) * 100 + month(date) ) lag date by one crspm2$date2 <- crspm2$date %m-% months(1) crspm2$yyymm <- year(crspm2$date2)*100 + month(crspm2$date2)

NOTE THESE ARE SIGNED! crspmsignal = crspm2 %>% transmute( permno , yyyymm , STreversal = -1*if_else(is.na(ret), 0, ret) , Price = -1*log(abs(prc)) , Size = -1*log(me) )

chenandrewy commented 2 years ago

Hmmm, I don't think we need the additional lag. @sehoff , would you have a literature reference? I don't mean to push back, I would just like a clear check.

When I read Fama-French 1993, I think they use end-of-June market equity to make trades at the end of June (see below). This would mean that there should be no additional lag for size (me in end of 202206 would be used to form a portfolio in end of 202206).

image

sehoff commented 2 years ago

Hi, sorry for the very late replay. I did not receive any notification on our reply.

Anyways, maybe I formulated my issues/comment imprecisely. But if I want to use your data for regressions of return on characteristics, I need to account for look-ahead bias by using returns over the month t+1 when regressing on characteristics at the end of month t. In the paper you mention that all accounting variables are already lagged by six or four month for yearly and quarterly accounting data, respectively. So, how should I treat other, e.g, return-based, variables such as idiosyncratic volatility if used as LHS variables in said regressions? Can I just lag the final dataset including all characteristics (signed_predictors_dl_wide.csv) by one month before merging to crsp returns, or, equivalently, shift the STreversal forward by one month keeping the characteristics fixed?

If I now understood correctly, there is no bug in the code, so I take back my previous comment!

So, sorry for the confusion, but could you clarify how I should use your dataset in said regressions?

chenandrewy commented 2 years ago

If I understand correctly, yes you should just lag the signal dataset by one month, then merge on to returns by permno-month, then regress by row.