driskerr / MLF

0 stars 0 forks source link

Project Review #1

Open albemlee opened 5 years ago

albemlee commented 5 years ago

Rubric Score

Criteria 1: Valid Python Code

Note: Try using the .copy() method to avoid the SettingWithCopyWarnings.

Criteria 2: Exploration of Data

Criteria 3: Machine Learning Techniques used correctly

Criteria 4: Report: Are conclusions clear and supported by data?

Criteria 5: Code formatting

Overall Score: 20/20

Amazing work! I loved the report (out of curiosity: where did you get those illustrations?), and you included some very interesting visualizations (first time I've seen the Venn diagram used in this way).

I also appreciate that you mentioned that neither of your regression models performed very well. Very often, researchers don't publish "bad" results, and other people repeat a "bad" experiments because they don't know that it has already been tried.

You mentioned some great next steps. I would add in model interpretation. Here is a great read on interpretable ML, and here are some tutorials. I'm curious to see what you can do with interpretable ML on the classification models you have already built.

Would it be okay to share this repository with future Codecademy MLF cohorts? I think this is an excellent example of a great capstone project.

driskerr commented 5 years ago

Hi albemlee, thanks for the nice review! Feel free to share with other MLF users.

Also, you can see the code for all the figures I created (and then some) in the Jupyter notebook I submitted. The venn diagrams came from a library called matplotlib_venn that I found by googling around.

Thanks for the tips on explaning/interpreting ML models - I am definitely bookmarking these recommendations.

albemlee commented 5 years ago

Another thing worth mentioning...

When evaluating the runtime for models it's worthwhile to look at the training time and prediction time separately. Think about a model that takes a long time to train but generates predictions really quickly versus a model that trains quickly but takes longer to generate predictions. Under what circumstances would one be preferred over the other (i.e. live application, research project, etc)?