Open Radascript opened 2 years ago
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Interesting topic!
Slight nick-picking and suggestions on the following:
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Excellent work team! Your topic is crucial from a policy perspective, and I particularly enjoyed the implementation of hypothesis testing to answer your research question. You did an excellent job with your writing and the flow was seamless and made it an easy read!
My feedback is intended to catapult your work from A+ to an A++. Not all comments may be applicable given the limitations of your dataset (or time), but I figured it is worth mentioning.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Wow!! Quite an interesting topic has been picked here and I am sure this is not just a theoretical project but has quite a lot of practical applications in policy making etc.
My feedback on the work done, please keep in mind that the feedback is just to make this work an exhaustive one. Hence I might be nick-picking here and there which you might choose to implement/ignore.:
Nevertheless, this is good work. Kudos to the team for all the efforts and hard work.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Thank you for all of your comments! We appreciated, agreed with, and implemented some of your comments.
From: @ytz and @nrao944
- Useful to briefly mention the number of data points on your data set, under the 'Data' subheading.
- Is it possible to provide dimensions of the data (total number of observations) for each of the groups in the README and the Data Section of the Report? It is noted that you have this available in your discussion.
Our implementation:
From: @adrianne-l
- In the report Results & Discussion section, would it be a good idea to include some sub-sections to summarise the interim findings to better navigate and follow your flow of result interpretation?
Our implementation:
From: @nrao944
- Your alternative hypothesis in the report should insert "not" equal, else it exactly matches the null hypothesis.
- In your report, the number of repeats, appears as N_REPEATS, and not a number.
Our implementation:
From: @ytz
- Not 100% sure whether the use of 'confidence interval' is correct in "...we noted the large overlap in the confidence intervals between the two groups"
- For Figure 2, consider using log scale on x-axis for Figure 2 to make the box-plots more prominent
- Since the focus is on the indigenous group, you could use a monotone colour for the non-indigenous group, and a primary colour like red or blue for the indigenous group. That will make it easier for the reader to interpret the chart
Our implementation:
- This line can't run: df_init = pd.read_csv('../data/offender_profile.csv', sep=r'\s,\s', header=0, encoding='ascii', engine='python') because your file is at this path '../data/RAW/offender_profile.csv'
- I can't run your code it has KeyError: 'Sentence Type' at cell 10
Our implementation:
- If figure captions are not provided the plot should be clearly explained in the text. I would recommend using figure captions.
Our implementation:
- Need to add some explanation to the plot and your code
Our implementation:
- You should create an environment.yaml file to contain all your dependencies
Our implementation:
- In usage, should write how to run each of your scripts, not just "make all" and "make clean"
Our implementation:
Submitting authors: @Radascript, @AraiYuno, @miyer26, @showcy
Repository: https://github.com/UBC-MDS/DSCI_522_inference_on_indigenous_vs_non_indigenous_sentence_length_differences
Report link: https://htmlpreview.github.io/?https://github.com/UBC-MDS/DSCI_522_inference_on_indigenous_vs_non_indigenous_sentence_length_differences/blob/main/doc/sentence_length_diffs_inference_report.html
Abstract/executive summary: For this project we have carried out a hypothesis test to determine if there was a significant difference in the median sentence lengths between the indigenous and non-indigenous offenders under the Correction Services Canada. The median was selected as the measure of central tendency and a permutation test under the null model was carried out computationally with a significance level of 0.05. The null hypothesis stated that there was no difference in the population medians in sentence length between indigenous and non-indigenous offenders. The alternate hypothesis stated that there is a difference in the population medians in sentence length between indigenous and non-indigenous offenders. The resulting sample difference in the two medians was -56 days, with a corresponding p-value of 0.0328. The indigenous group was found to have shorter sentence lengths than the non-indigenous group. As this p-vaule was smaller than the significance level, there was statistically significant evidence to reject the null hypothesis that stated that there is no statistically significant difference in the median sentence lengths between the two groups. As we had a large sample size for both groups, our model was very sensitive to small differences in the median of both groups. Though this may raise some concern regarding the practical implications of the study, we believed it was important not to miss any existing effect due to the sensitivity of the issue at hand. The cost of a Type II error is more significant than a Type I error.
The data set used for this study is the Offender Profile from 2017-2018 released by the Correctional Service of Canada. The link to this site can be found here. Each entry in the data set corresponds to a single offender serving a two or more year long sentence. The demographic details such as age, gender and marital status at year end are provided for each entry. This was retrieved from the Offender Management System (OMS).
Editor: @Radascript, @AraiYuno, @miyer26, @showcy Reviewer: Nagraj Rao, TZ Yan, Abhiket Gaurav, Adrianne Leung