UBC-MDS / olympic_medal_htest

MIT License
0 stars 10 forks source link

Add scripts 01, 02, 03 and .Rmd file. #15

Closed stevenleung2018 closed 2 years ago

stevenleung2018 commented 2 years ago

I have tested these 3 Python scripts and the .Rmd file, and they are working.

You can run the following on the command line if you execute from the root of the repo:

python src/01_download_data.py --url="https://github.com/rfordatascience/tidytuesday/raw/master/data/2021/2021-07-27/olympics.csv" --saving_path="data/raw/olympics.csv"
python src/02_cleaning.py --reading_path="data/raw/olympics.csv" --saving_path="data/olympics_cleaned.csv"
python src/03_EDA_olympics.py --reading_path="data/olympics_cleaned.csv" --saving_path="results"
Rscript -e "rmarkdown::render('results/03_EDA.Rmd')"

The changes to script 01 is very minimum. But I have done some additional testing. The script 02 is created. It only drops rows with age missing. The script 03 generates 5 figures.

Then you will have 5 new PNG files under the results folder. These files will be used by the .Rmd file.

I have created a complete draft of the .Rmd file called 03_EDA.Rmd for the EDA report. Please note that I have to put the 03_EDA.Rmd under the results folder because the files it reads need to be in the same folder or in the subfolders and it cannot change folder of a higher level. I think this is fine because the .Rmd file is part of a report anyway.

Currently the .Rmd file is set to knit an HTML file. But we should be able to change it to PDF later.