Open shaunhutch opened 1 year ago
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Estimated hours spent reviewing: 2 hours Review Comments: Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above. Thank you for the opportunity to review your project. You did a great job on the project and analyzed an interesting research question. Here is my feedback on further improving your project.
It is a pleasure reviewing your project. Keep up the good work!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
(Review based on latest commit at main
, https://github.com/UBC-MDS/drug_consumption_prediction/commit/6f53e64aa817d54dd016f8da0c0c2f1fd635a1a0)
It is a joy to read through your project. You have chosen a great research topic and dataset, and I can see you have put in a lot of work and care on the project.
What I like most is the code is clean, neat, and well-commented, and the report is very structured.
There are a few additional comments on some of the things I would love to see, in hope that the project can be even better and easier for others to follow and reproduce. Thus, I am not going to comment anything on the writing (e.g., occasional typos) or the analysis itself.
Consider using backticks (`
) the required steps to highlight the necessary steps to run the project. This includes the actual commands to run different scripts. Currently, the steps are unmarked, buried within the instructions. It would be nice to use inline code, or even code blocks to highlight the actual commands to be run. For example:
The SVM RBF Model analysis can be replicated using the following script located (here). In order to run this analysis, run:
python src/drug_consumption_prediction_model.py --data_path="../data/preprocessed/" --result_path=""../results/"
This can allow the users to know which steps to run easily.
This is relatively minor, but consider citing the author's dataset and the UCI service using their preferred citations, which are:
(for the dataset, source)
E. Fehrman, A. K. Muhammad, E. M. Mirkes, V. Egan and A. N. Gorban, "The Five Factor Model of personality and evaluation of drug consumption risk.," arXiv [Web Link], 2015
(for UCI, source)
Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
In addition, a small nitpick is that the text Except where otherwise noted, the example programs and other software provided in the introduction-to-data-science repository are made available under the MIT license.
in the README
should be updated to point to the project repo instead of referencing to "introduction-to-data-science" repository.
Also, regarding the README
, it would be better if the Dependencies
section is placed before the Downloading the Data
. An argument for it is that it will then follows the logical order of a user trying to run the project. Otherwise, I guess some users will likely copy-and-pasting commands to the point where they realized that they don't necessarily have the environment set up properly.
On the same note, you may also consider including requirements.txt
(for PIP) or environment.yaml
(for Conda) so that others can easily replicate the environment.
Consider including a flow chart so that users can visualize each of the steps of the analysis better. In our project, we used diagrams.net (formerly draw.io) and had a great experience.
I know that this may not be required at this stage, but since you already have your final project text in HTML, you may as well consider publishing that with GitHub pages so that readers can also read your report in their computers or phones too.
Overall, I really like your take on the problem. Great work!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Thank you for your reviews, here is a list of the feedback that we have received:
The quality of the scripts was improved by adding example usage, docstring, and more functions (eg download data): model script commit preprocess commit download data script commit
We agree with the feedback about adding an analysis directory, instead of having the analysis results just in the results folder. It improves project organization and understandability: commit
We added the report to GitHub pages because we agreed with the feedback about making our report more accessible since we already have the HTML. commit
We corrected the usage section so commands can be easier to use and can be copy-pasted. commit
We added an environment.yml file to make the repository easier to reproduce. commit
We agreed with the feedback that that we should be citing the authors and the dataset separately and have done so here: commit
We have included code chunk options in the Rmd file for the report to not show warnings for the Knitr:Kable tables. commit
Submitting authors: @shaunhutch @ritisha2000 @brabbit61
Repository:https://github.com/UBC-MDS/drug_consumption_prediction/tree/main Report link: https://github.com/UBC-MDS/drug_consumption_prediction/blob/main/doc/drug_consumption_prediction_report.html Abstract/executive summary:
With drug overdoses on the rise, especially in British Columbia, it is important that we understand what factors can influence someone into trying out drugs. Investigation of this problem could give us insight into what personality characteristics are the main motivators towards certain drugs and apply those conclusions when making public health decisions.
We wanted to look at behavioural data to see if this could allow us to predict someone's level of consumption of both illegal and legal drugs. predict the level of consumption of a selection of drugs given their personality measurements, NEO-FFI-R (neuroticism, extraversion, openness to experience, agreeableness, and conscientiousness), BIS-11 (impulsivity), and ImpSS (sensation seeking), and personal characteristics (level of education, age, gender, country of residence.
The data that we used in the project is from a database that was collected by Elaine Fehrman between March 2011 and March 2012 which was sourced from the UCI Machine Learning Repository. Drug Consumption Dataset
For this model, we predict the classification using SVM RBF classification model. The model was scored based on accuracy with a best accuracy of 0.735
Editor: @flor14 Reviewer: Yaou Hu, Kelvin Wong, Kelly Wu