Closed jennifer-hoang closed 2 years ago
Outliers in shares_per_day
were removed using the Winsorization method, defined as values less than the 1% percentile and greater than the 99% percentile.
Created templates for README.md and report.Rmd files according to Milestone 2 requirements
Looks good to me
An early version of data preprocessing script that:
Can be run in command line using:
python src/onp_data_preprocess.py --raw_data='data/raw/OnlineNewsPopularity/OnlineNewsPopularity.csv' --out_dir='data/processed/'