Open wakesyracuse7 opened 1 year ago
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Hi, Group 14,
The team of Au-Yeung, Wang, and Zhang perform a machine-learning (ML) study aiming to classify pregnant women into high, medium, and low-risk groups of maternal mortality based on some health measurements (such as age, bloop pressure, glucose levels, etc). In this study, they started by performed exploratory data analysis, experimented with classification using multiple ML models, and proposed the best estimator to be DecisionTree Classifier with max_depth=29. The proposed model has a mean score of 0.823 when predicting the test data set. This is a good test score given there isn't a significant class imbalance issue.
Below are some constructive feedbacks listed in order of importance I think the authors may consider.
DecisionTreeClassifier
is not comprehensive. Perhaps the authors can first discuss what each model does in making predictions and then conclude why DecisionTreeClassifier is an appropriate estimator compared to SVM, for example. Overall, I congratulate the authors on successfully building a model from start to finish, and for delivering the results in an easy-to-follow manner. With a few tweaks, I believe this project meets the standards of a comprehensive analysis of predicting maternal mortality risks using collected health metrics. Well done!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Please provide more detailed feedback here on what was done particularly well, and what could be improved. It is especially important to elaborate on items that you were not able to check off in the list above.
Good jobs team 14! It's really interesting to raise a question to study the three different female groups with different risk of maternal mortality based on health measurements features you select and I believe answering this question can definitely have positive impact on medical related research. Here are my suggestions and comments for you porject:
Overall, the project conveys clear thought flow and accurate data processing. It's not easy to build a project from ground zero and with some efforts in future updates, I believe this project will provide insightful information and results for the medical society . Nice job, guys!
This was derived from the JOSE review checklist and the ROpenSci review checklist.
Code quality is very strong
This project is well put together and it is obvious that the group understands the typical data science workflows. It's worth noting that this group only has 3 members instead of 4, but this has not diminished the quality of the submission. I was specifically impressed by the quality of the code across the board. Clear, modular, and easy to follow, plus error handling and function testing! Excellent work! I also appreciate that the group chose a dataset with the potential to bring good to the world - I think it's important to make time to explore data like this when possible.
I struggled to get the automated scripts to run. Unfortunately, download_data.py
, eda_script.py
, and rendering the final report all failed to run successfully on my Linux Ubuntu machine, even after installing all the listed packages in a new conda environment. I copy pasted directly from the usage section, but did not troubleshoot further if they did not work. I've included the command I ran and any console replies at the end of this comment if you'd like to investigate further.
Regarding your conclusions, I would challenge your assertion that the Decision Tree Classifier is the best model to use here, despite it getting the best cross-validation scores. Taking 6 features to a depth of 29 means we're revisiting the same feature a lot and I suspect the model might suffer from overfitting as a result. There was a large difference between training and validation scores on your decision tree classifier. I was more inclined to choose the SVM model which did not suffer from the same overfitting, even though it cost you some accuracy. Also, as you discussed, we'd probably want to choose a different scoring function anyways which might change the equation again.
Other thoughts I had while reviewing - feel free to take 'em or leave 'em :)
I like the density distribution plots in your EDA a lot. They clearly show that the distribution differs between the classes for almost every feature; you should be able to train a strong classifier as a result.
You should be careful about making claims about the application of your model and findings to women's health in general because the sample is limited to a few years in a specific geographic region (rural Bangladesh).
In the report, you mention "The R programming language was used to perform the analysis", but all of your analysis is done in python as far as I can see.
Somewhere in your report I would have liked to see some indication on the severity of each risk level. How bad is "high", for example. How much worse is high than "mid".
Because these are health related outcomes, you might want to choose a scoring metric that penalizes missing high risk patients more; better to end up with a "overly pessimistic" model than an "optimistic" one.
You have the right idea in the "Future Directions" section, but keep in mind that we can maximize the "recall" of high risk patients by training a dummy model that always predicts "high risk". This is why f1 score is usually the alternative if we're worried about class imbalance and the type 2 errors for a particular class.
Your "standard" classification model doesn't take into account the ordinality of the classes. It is more of a mistake to misclassify a High Risk patient as Low Risk than Medium Risk, and if this happens often (it doesn't on the test set - those are some big red flags for the model). This application might call for a custom scoring function.
Alternatively, you might even consider encoding say 'Low Risk' to 0, 'Med Risk' to 0.5, and 'High Risk' to 1, then training a regression model instead. Obviously since the data isn't set up this way tread with caution, but I like the idea of a continuous prediction scale instead of a discrete one.
Failed download_data.py
script
$ python src/download_data.py --out_type='csv' --url='https://archive.ics.uci.edu/ml/machine-learning-databases/00639/Maternal%20Health%20Risk%20Data%20Set.csv' --out_file='data/raw/maternal_risk.csv'
Usage: src/down_data.py --out_type=<out_type> --url=<url> --out_file=<out_file>
Options:
--out_type=<out_type> Type of file to write locally (script supports either feather or csv)
--url=<url> URL from where to download the data (must be in standard csv format)
--out_file=<out_file> Path (including filename) of where to locally write the file
Failed eda_script.py
script
$ python src/eda_script.py --data_location='data/raw/maternal_risk.csv' --output_location='src/maternal_risk_eda_figures/'
Usage: src/eda_script.py --data_location=<data_location> --output_location=<output_location>
Options:
--data_location=<data_location> Location of the data to be used for eda
output_location=<output_location> Location to output the visulisations
Failed render script
$ Rscript -e "rmarkdown::render('doc/final_report.Rmd')"
processing file: final_report.Rmd
|..... | 7%
ordinary text without R code
|......... | 13%
label: setup (with options)
List of 1
$ include: logi FALSE
|.............. | 20%
ordinary text without R code
|................... | 27%
label: unnamed-chunk-1 (with options)
List of 2
$ fig.align: chr "center"
$ fig.cap : chr "Figure 1. Counts of observation for each class in train data set"
|....................... | 33%
ordinary text without R code
|............................ | 40%
label: unnamed-chunk-2 (with options)
List of 2
$ fig.align: chr "center"
$ fig.cap : chr "Figure 2. Distribution of training set predictors for high risk, mid risk and low risk"
|................................. | 47%
ordinary text without R code
|..................................... | 53%
label: unnamed-chunk-3 (with options)
List of 2
$ fig.align: chr "center"
$ fig.cap : chr "Figure 3. Pairwise relationship between predictors"
|.......................................... | 60%
ordinary text without R code
|............................................... | 67%
label: load data
|................................................... | 73%
ordinary text without R code
|........................................................ | 80%
label: unnamed-chunk-4 (with options)
List of 3
$ fig.align: chr "center"
$ fig.cap : chr "Figure 4. Pairwise relationship between predictors"
$ out.width: chr "50%"
|............................................................. | 87%
ordinary text without R code
|................................................................. | 93%
label: confusion_matrix
Quitting from lines 97-99 (final_report.Rmd)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
Calls: <Anonymous> ... eval_with_user_handlers -> eval -> eval -> read.csv -> read.table
Execution halted
Based on MDS data analysis checklist, which was derived from the JOSE review checklist and the ROpenSci review checklist.
Peer review feedback:
Submitting authors: @wakesyracuse7, @lennonay, @shlrley
Repository: https://github.com/UBC-MDS/maternal_health_risk_predictor Report link: https://github.com/UBC-MDS/maternal_health_risk_predictor/blob/main/doc/final_report.md Abstract/executive summary:
In this project, we propose a Decision Tree classification model to predict whether an individual may be at low, mid, or high maternal health risk given some information about their age and health. Our final chosen model had a max depth of 29, and performed relatively well on unseen data with 203 observations. The test score was 0.823, with 53 out of 60 high risk targets predicted correctly. However, further steps can be taken to improve the model, such as tuning or other hyperparameters or grouping the target classes into high risk and 'other'.
The full data set was sourced from the UCI Machine Learning Repository (Dua and Graff 2017), and can be found here. A .csv format of the data can be directly downloaded using this link. The data can be attributed to Marzia Ahmed (Daffodil International University, Dhaka, Bangladesh) and Mohammod Kashem (Dhaka University of Science and Technology, Gazipur, Bangladesh) (Ahmed and Kashem, 2020).
Editor: @flor14 Reviewer: HanChen Wang, Yukon Zhang, Daniel Cairns