Authors: (DSCI 522 Group 304) Anny Chih, Robert Pimentel, & Wenjiao Zou
Using (BC Ministry of Education 2019a) here and (BC Ministry of Education 2017a) here from school years 2007/2008 through 2018/2019, we looked at whether there are differences in exam performance between different subgroups and school types to answer two main inferential questions:
By conducting hypothesis testing using t-test statistics and a 95% confidence interval, we determined that:
The dataset used in this project includes two data files that contain FSA scores from BC students in Grades 4 and 7. Only data for the Numeracy and Reading portions of the FSA exam were analyzed (i.e. excludes Writing score analysis; see report for additional details).
Links to Preview Data Source:
- (BC Ministry of Education 2019b)
link
- (BC Ministry of Education 2017b)
link
The final report can be found here
To replicate this analysis, you can choose one of the four options
listed below.
For a visual map of how each file is connected, please see the Makefile
Dependency Diagram below.
Option 1: Using Terminal
# Loads Data
Rscript src/load_data.R 'https://catalogue.data.gov.bc.ca/dataset/5554165d-e365-422f-bf85-4f6e4c9167dc/resource/bcb547f0-8ba7-451f-9e11-10524f4d57a0/download/foundation-skills-assessment-2017-18_to_2018-19.csv' --arg2='data/fsa_2017-2018.csv'
Rscript src/load_data.R 'https://catalogue.data.gov.bc.ca/dataset/5554165d-e365-422f-bf85-4f6e4c9167dc/resource/97c6cbf7-f529-464a-b771-9719855b86f6/download/fsa.csv' --arg2='data/fsa_2007-2016.csv'
# Cleans Data
python src/clean_data.py --raw_data1='data/fsa_2007-2016.csv' --raw_data2='data/fsa_2017-2018.csv' --local_path='data/clean_data.csv'
# Creates Data Subset of only schools with both Aboriginal and Non Aboriginal students (based on data in 2018/2019 school year)
python src/filter_schools_both_subgroups.py --clean_data='data/clean_data.csv' --new_data='data/filtered_schools_both_subgroups.csv'
# Appends a column to the clean_data file with info about whether the school has both Aboriginal and Non Aboriginal students
python src/add_subgroup_info.py --clean_data="data/clean_data.csv" --new_data="data/new_clean_data.csv"
# Produces EDA Bar and Line Charts
Rscript src/data_viz_tab.R --data='data/clean_data.csv' --out_dir='img'
# Produces Histograms for Inferential Question 1: Difference Between Public and Independent School Scores
Rscript src/plot_publicindep_histogram.R 'data/clean_data.csv' --arg2='img/' --arg3='fig_pi_histogram_numeracy.png' --arg4='fig_pi_histogram_reading.png' --arg5='fig_pi_histogram_writing.png'
# Produces Histograms for Inferential Question 2: Difference Between Aboriginal and Non Aboriginal Scores
Rscript src/plot_subgroup_histogram.R 'data/clean_data.csv' --arg2='img/' --arg3='fig_ana_histogram_numeracy.png' --arg4='fig_ana_histogram_reading.png' --arg5='fig_ana_histogram_writing.png'
# Renders Report
Rscript -e "rmarkdown::render('doc/report.Rmd')"
Option 2: Using Makefile
make clean
at the command line/terminal make all
at the command line/terminalOption 3: Using Shell Script
bash runall.sh
at the command line/terminalOption 4: Using Docker
docker run --rm -v /$(pwd):/home/rstudio/DSCI_522_Group304 annychih/dsci522_group304_docker make -C /home/rstudio/DSCI_522_Group304 clean
at the command line/terminaldocker run --rm -v /$(pwd):/home/rstudio/DSCI_522_Group304 annychih/dsci522_group304_docker make -C /home/rstudio/DSCI_522_Group304 all
at the command line/terminalPython 3.6.9 and Python packages:
R version 3.6.1 and R packages:
GNU make 3.81
Docker version 19.03.5
pandoc version >= 2.9.1.1