Looks like you guys are pretty close to having your question and project wrapped up! I think you guys chose an interesting dataset and question, and I wonder how many other datasets might be available to help augment your analysis. I don't have too many big suggestions below, and I hope I gave you enough time to polish your work.
Documentation: All of the scripts seem documented properly and it's clear what they're doing. One thing I might nitpick is the main project README has a very long about section. It might be nice to split it into two sections, one that summarizes the project, and the other that goes into more detail.
Code: No major code issues, and scripts are broken into logical units. Things to note:
There looks to be two instances of train_test_split; one in a script and another in the EDA.ipynb. I'm not sure it's needed in the EDA notebook.
The figure results seem to be outputted to src/breast_cancer_eda_figs. Would these not be results?
The scripts might be broken down into modular functions a bit more. For example, the fit_cancer_prediction.py script is one long script.
Reproducibility: I had an issue running make all in the conda environment. I don't want it to be too big of a deal as it might of just been something I did wrong on my end, and we're moving to docker this Milestone regardless.
Here's my output in case it helps:
python src/test_cancer_prediction.py --in_test_file='data/raw/test.csv' --model='results/trained_model.sav' --out_file='results/prediction_table.csv' Usage: test_cancer_prediction.py --in_test_file=<in_file> --in_model=<in_model> --out_matrix=<out_matrix> --out_table=<out_table> make: *** [results/prediction_table.csv] Error 1
Analysis and Reasoning: The analysis uses the methodologies we've been taught well. It might be nice to have a little more discussion on some of the figures produced and the thought processes. There is a duplicate EDA0.ipynb file, is this something to be cleaned up?
Communication: Just reiterating the last comment above that it might be nice to have more discussion on all your work!
Suggestions: See the above for suggestions!
Final week of fall classes, good luck in wrapping up your project! I hope my comments are useful!
Hey Group16!
Looks like you guys are pretty close to having your question and project wrapped up! I think you guys chose an interesting dataset and question, and I wonder how many other datasets might be available to help augment your analysis. I don't have too many big suggestions below, and I hope I gave you enough time to polish your work.
Documentation: All of the scripts seem documented properly and it's clear what they're doing. One thing I might nitpick is the main project README has a very long about section. It might be nice to split it into two sections, one that summarizes the project, and the other that goes into more detail.
Code: No major code issues, and scripts are broken into logical units. Things to note:
Reproducibility: I had an issue running make all in the conda environment. I don't want it to be too big of a deal as it might of just been something I did wrong on my end, and we're moving to docker this Milestone regardless. Here's my output in case it helps:
python src/test_cancer_prediction.py --in_test_file='data/raw/test.csv' --model='results/trained_model.sav' --out_file='results/prediction_table.csv' Usage: test_cancer_prediction.py --in_test_file=<in_file> --in_model=<in_model> --out_matrix=<out_matrix> --out_table=<out_table> make: *** [results/prediction_table.csv] Error 1
Analysis and Reasoning: The analysis uses the methodologies we've been taught well. It might be nice to have a little more discussion on some of the figures produced and the thought processes. There is a duplicate EDA0.ipynb file, is this something to be cleaned up?
Communication: Just reiterating the last comment above that it might be nice to have more discussion on all your work!
Suggestions: See the above for suggestions!
Final week of fall classes, good luck in wrapping up your project! I hope my comments are useful!