braverock / FactorAnalytics

67 stars 63 forks source link

Market Cap Groupings in factorsSPGMI and stocksCRSP #87

Open spinnj opened 2 years ago

spinnj commented 2 years ago

Market cap groupings (Large Cap, Mid Cap, Small Cap, Micro Cap) were apparently assigned by ??? and were not official data pulled by from CRSP or S&P Global Markets. Two issues have been identified:

  1. The assignments themselves may not be "correct" in some sense and the methodology used is not known to me.
  2. The assignments appear to be constant over time. The stock "AMD" had market capitalization volatility over the sample period such that it would have likely been small, mid, and large at times, but it is in the "MidCap" group for the entire sample. This will make future merges of the factorsSPGMI data with other data sets unlikely to go well, where vendors would have the cap grouping changing over time.

CRSP reconstitutes membership in their cap groupings quarterly according to a 70%, 85%, 98% set of breakpoints for cumulative market capitalization coverage and also has a method for dealing with names that are on the border between groups.

I'm highlighting this issue for @JustinMShea to see if there's any desire to try to clean up the cap groupings (e.g. by applying an industry standard approach for a point-in-time assignment of stocks to groupings similar to CRSP or if the current data are "good enough".

JustinMShea commented 2 years ago

I think there is a strong desire to clean this up, great catch @spinnj. The market cap changes over time might make for good questions on assignments, so including this would be helpful. Again, I think @braverock believes the data set should be as close to reality as possible, which I agree with.

We may need to get the original files to do this, and ultimately recreate stocksCRSP and factorsSPGMI objects to be sure everything is correct. The scripts to do so could be included in the package as vignette as well, which would both document the entire process we used as well as give those with access to the raw datasets the ability to quickly load & transform them in R.