braverock / FactorAnalytics

67 stars 63 forks source link

issue 86: add script to sandbox/Vestcor Data Cleaning for review #88

Closed spinnj closed 2 years ago

spinnj commented 2 years ago

Trying this again, one piece at a time (instead of all the changes that were included in the prior PR) to keep things neater and make sure I'm not going in a direction that isn't appreciated.

This pull request does not modify any source data, instead it includes only an R script ./sandbox/Vestcor Data Cleaning/spinnj_issue86.R that outlines my suggestions for fixing the sector names which do not adhere to SPGMI official naming conventions as a result of (I assume) manual data manipulations made by someone historically during the process of trying to merge CRSP with SPGMI data.

This can be reviewed and if this is a good change, the commented code at the end will replace factorsSPGMI and stocksCRSP with versions that have official S&P/MSCI GICS Sector Names for all rows. I did not make any changes to factorsSPGMI or stocksCRSP myself.

If this is accepted, then some other data issues (issues #85 and #73) can also be completed easily in the same fashion.

If this is not accepted, then I'll stop looking into cleaning up factorsSPGMI and stocksCRSP and take a look at the roadmap of issues to get on CRAN instead.

JustinMShea commented 2 years ago

@spinnj This is very good and most welcome, thank you. Yes, we would very much like to retain all original source names of the data where possible for several good best practices.