Project Review - Githubissues

albemlee commented 5 years ago

Rubric Score

Criteria 1: Valid Python Code

Score Level: 4/4
Comment(s): Code included runs without any errors.

Note: Try using the .copy() method to avoid the SettingWithCopyWarnings.

Criteria 2: Exploration of Data

Score Level: 4/4
Comment(s): Data is explored significantly, and the experimental question(s) chosen are logical and based on the data exploration. Features chosen to answer the question make sense.

Criteria 3: Machine Learning Techniques used correctly

Score Level: 4/4
Comment(s): Algorithms are used correctly and the correct conclusions are drawn from the results.

Criteria 4: Report: Are conclusions clear and supported by data?

Score Level: 4/4
Comment(s): Question(s) are stated clearly. The results of 2 regression algorithms and 2 classification algorithms are shown. Conclusions are clearly stated and based on evidence.

Criteria 5: Code formatting

Score Level: 4/4
Comment(s): Code is formatted clearly and readable.

Overall Score: 20/20

Amazing work! I loved the report (out of curiosity: where did you get those illustrations?), and you included some very interesting visualizations (first time I've seen the Venn diagram used in this way).

I also appreciate that you mentioned that neither of your regression models performed very well. Very often, researchers don't publish "bad" results, and other people repeat a "bad" experiments because they don't know that it has already been tried.

You mentioned some great next steps. I would add in model interpretation. Here is a great read on interpretable ML, and here are some tutorials. I'm curious to see what you can do with interpretable ML on the classification models you have already built.

Would it be okay to share this repository with future Codecademy MLF cohorts? I think this is an excellent example of a great capstone project.

driskerr commented 5 years ago

Hi albemlee, thanks for the nice review! Feel free to share with other MLF users.

Also, you can see the code for all the figures I created (and then some) in the Jupyter notebook I submitted. The venn diagrams came from a library called matplotlib_venn that I found by googling around.

Thanks for the tips on explaning/interpreting ML models - I am definitely bookmarking these recommendations.

albemlee commented 5 years ago

Another thing worth mentioning...

When evaluating the runtime for models it's worthwhile to look at the training time and prediction time separately. Think about a model that takes a long time to train but generates predictions really quickly versus a model that trains quickly but takes longer to generate predictions. Under what circumstances would one be preferred over the other (i.e. live application, research project, etc)?

driskerr / MLF

Project Review #1