Open czwcandy opened 7 months ago
This was derived from the JOSE review checklist and the ROpenSci review checklist.
1.5 hours
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
The project focus on the problem of student academic dropout in higher education. It is well-organized, with a clear separation between scripts, data, and reports. The documentation provides essential guidance, and the use of Docker ensures good reproducibility. The writing is of good quality, concise, and informative, with the summary offering a clear overview. The methods are described directly, and the results are effectively communicated. Below are suggestions for improvement:
Usage Via Docker: In the Usage Via Docker
section, the commands for running the analysis in step 3 presume the user is within the scripts
folder, while the command to build an HTML report in step 4 assumes the user is at the project root. The transition between directories might not be evident, potentially leading to confusion. Including explicit cd
(change directory) commands within the instructions, or restructuring all commands to be executable from the same directory, would be advantageous.
Analysis Report Enhancements: While the report thoroughly details the methodology and data analysis, expanding on the background of the problem and its significance would provide a more comprehensive understanding of the study. The font size in Fig.1
and Fig.2
is too small, making it difficult for readers to interpret the data. Increasing the font size for better legibility is recommended. Additionally, meaningful column names are essential for understanding data tables. The Unnamed: 0
columns in Fig.4
and Fig.5
should be given descriptive titles that accurately reflect the data they represent.
Proofreading: Some errors, such as the presence of placeholder text in the first reference of the README and the environment.yml file not being updated, could be avoided with thorough proofreading.
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Highlights - Things that impressed me:
Suggestions:
Overall, great job Group 15. I have learned much from your project and I can't wait to read more!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Overall, very thoughtful report with multiple modeling methods! Really enjoyed going through your analysis about students success. Your visualization was very sophisticated and informative. The areas you could improve your project:
Overall Structure: The project could benefit from more clear background information and motives for this data analysis study. Some more details on explicit final results would also be helpful.
I noticed there were several typos that can be disturbing for the readers.
Usage via Docker: The command lines have different relative paths that could be misleading to the others.
It was very good report, I enjoyed reading them!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Submitting authors: @czwcandy @tangyl92 @hchqin @billwan96
Repository: https://github.com/UBC-MDS/Student_Success_Predict_Group15 Report link: https://ubc-mds.github.io/Student_Success_Predict_Group15/student_success.html Abstract/executive summary: In our study, we developed machine learning models, including SVM, Random Forest, and Logistic Regression (with L1 and L2 regularization), to predict the likelihood of student academic dropout in higher education. Due to a high number of features and their inter-correlations, our models initially exhibited overfitting. To address this, we implemented feature selection techniques (PCA and feature importance analysis) along with model’s parameter optimization. The refined models demonstrated improved performance, evidenced by a narrow gap between training and testing accuracy. Among the three, SVM marginally outperformed the others, achieving an accuracy of 80% and an AUC score of 0.89. Nonetheless, there is potential for further enhancement in model performance through additional feature engineering and more extensive parameter tuning.
Editor: @czwcandy Reviewer: <@Sampsonyu> <@hema2022ubc> <@sho-i98> <@lichunubc>