OpenSourceAP / CrossSection

Code to accompany our paper Chen and Zimmermann (2020), "Open source cross-sectional asset pricing"
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3604626
GNU General Public License v2.0
738 stars 218 forks source link

Clearly define data categories and patch miscategorizations #98

Closed chenandrewy closed 1 year ago

chenandrewy commented 1 year ago

Dmitriy Muravyev pointed out that we don't define the data categories in the paper. We should add a definition to the documentation.

We had meant to assign any variable with X/[market value] based on the data source for X, but it seems for a handful of cases the signal is miscategorized as price, namely EP, EquityDuration, NetPayoutYield, and PayoutYield

chenandrewy commented 1 year ago

The miscategorizations are patched here: 77ae5178b7f1392cb1e2be41f92f4dae69587bd3. It seems we should add a documentation page to openassetpricing.com. I like the guidance for how to use the Readme / webpage / and Wiki in here: https://stackoverflow.com/questions/32430473/what-are-the-main-functionality-differences-between-github-wiki-and-readme

chenandrewy commented 1 year ago

We should move this to openassetpricing.com, but for guidance, the SignalDoc.csv Cat.Data entries categorize each predictor by the type of data involved into the following categories:

  1. 13F: Based on LSEG / Refinitive / TR's 13F data
  2. Accounting: Based on accounting variables from Compustat. Includes accounting valuations (f(accounting variables) / [market equity])
  3. Analyst: Based on analyst forecast and recommendation data from IBES. Includes analyst-based valuations (f(analyst variables)/[market equity])
  4. Event: Based on discrete firm events like dividend initiation, IPO, exchange switches.
  5. Options: Based on option market data (including option volume) from OptionMetrics.
  6. Other: Based on assorted random data, like BEA IO Tables and Patent Office data.
  7. Price: Based on past and current stock market prices.
  8. Trading: Based on volume, positioning, and microstructure data.
chenandrewy commented 1 year ago

Based on the above definitions, I recategorized a handful of predictors here: 5ab073ee44006c0c24e42749ffad3756a83b53c5

chenandrewy commented 1 year ago

@tomz23 Can you please post these definitions at www.openassetpricing.com and close out this issue?

We should move this to openassetpricing.com, but for guidance, the SignalDoc.csv Cat.Data entries categorize each predictor by the type of data involved into the following categories:

  1. 13F: Based on LSEG / Refinitive / TR's 13F data
  2. Accounting: Based on accounting variables from Compustat. Includes accounting valuations (f(accounting variables) / [market equity])
  3. Analyst: Based on analyst forecast and recommendation data from IBES. Includes analyst-based valuations (f(analyst variables)/[market equity])
  4. Event: Based on discrete firm events like dividend initiation, IPO, exchange switches.
  5. Options: Based on option market data (including option volume) from OptionMetrics.
  6. Other: Based on assorted random data, like BEA IO Tables and Patent Office data.
  7. Price: Based on past and current stock market prices.
  8. Trading: Based on volume, positioning, and microstructure data.
tomz23 commented 1 year ago

Added here: https://www.openassetpricing.com/faq/#q-datacat