UBC-MDS / online_news_popularity

Assessing factors associated with online news popularity for DSCI 522
Other
1 stars 3 forks source link

Add preliminary data preprocessing script #12

Closed jennifer-hoang closed 2 years ago

jennifer-hoang commented 2 years ago

An early version of data preprocessing script that:

Can be run in command line using: python src/onp_data_preprocess.py --raw_data='data/raw/OnlineNewsPopularity/OnlineNewsPopularity.csv' --out_dir='data/processed/'

jennifer-hoang commented 2 years ago

Outliers in shares_per_day were removed using the Winsorization method, defined as values less than the 1% percentile and greater than the 99% percentile.

jennifer-hoang commented 2 years ago

Created templates for README.md and report.Rmd files according to Milestone 2 requirements

nrao944 commented 2 years ago

Looks good to me