Open spinnj opened 2 years ago
I think there is a strong desire to clean this up, great catch @spinnj. The market cap changes over time might make for good questions on assignments, so including this would be helpful. Again, I think @braverock believes the data set should be as close to reality as possible, which I agree with.
We may need to get the original files to do this, and ultimately recreate stocksCRSP and factorsSPGMI objects to be sure everything is correct. The scripts to do so could be included in the package as vignette as well, which would both document the entire process we used as well as give those with access to the raw datasets the ability to quickly load & transform them in R.
Market cap groupings (Large Cap, Mid Cap, Small Cap, Micro Cap) were apparently assigned by ??? and were not official data pulled by from CRSP or S&P Global Markets. Two issues have been identified:
CRSP reconstitutes membership in their cap groupings quarterly according to a 70%, 85%, 98% set of breakpoints for cumulative market capitalization coverage and also has a method for dealing with names that are on the border between groups.
I'm highlighting this issue for @JustinMShea to see if there's any desire to try to clean up the cap groupings (e.g. by applying an industry standard approach for a point-in-time assignment of stocks to groupings similar to CRSP or if the current data are "good enough".