MLF Capstone Feedback - Githubissues

Rubric Score

Criteria 1: Valid Python Code

Score Level: 4 (Exceeds expectations)
Comment(s): Great job, your code runs without any errors.

Criteria 2: Exploration of Data

Score Level: 3 (Meets expectations)
Comment(s): Very good start exploring your data. Good job looking at the relationships between the variables you were interested in. Your pairs plot was difficult to read, so it would have been much better if you had isolated the relevant row instead of displaying all of the extra irrelevant information. Very good discussion of your data exploration findings. Ideally you would have used the data exploration stage to look for a more promising relationship between variables, but it's okay that you went forward with your research question without having any evidence that your machine learning models would be able to predict the outcomes.

Criteria 3: Machine Learning Techniques used correctly

Score Level: 3 (Meets expectations)
Comment(s): In general, you did a very good job using machine learning techniques. Very nice job interpreting your results, and focusing on performance measures such as F1 and R^2 (most people just look at accuracy). Given you have a clear understanding of the models and how to interpret the results, it would have been MUCH better if you had chosen another research question that was well-suited for regression. As you mentioned, regression models should not be used to predict categorical outcomes. Also, good job acknowledging that your data was imbalanced and the effect that that had on your models' performance. It would be worth looking into different ways to deal with imbalanced data, like resampling (https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets).

Criteria 4: Report: Are conclusions clear and supported by data?

Score Level: 4 (Exceeds expectations)
Comment(s): You did an excellent job presenting your research question, comparing your machine learning algorithms, and citing your results as evidence for your conclusions.

Criteria 5: Code formatting

Score Level: 4 (Exceeds expectations)
Comment(s): Great job formatting your code in Jupyter Notebooks.

Overall Score: 18/20

Thanks for taking the time to do a very thorough review!! I enjoyed the course and learned a lot from it.

I do have one suggestion to make. It would be very helpful to have a useful text that discusses practical applications and methods in machine learning that has chapters applicable to the course. One that I have recently bought and used is called "Machine learning with python cookbook" by Chris Albon. Its inexpensive, very easy to read and follow, and very practical. Many of the chapters fleshed out in more detail the algorithms done in this course.

Best

Al Deckel .

On Fri, Jan 18, 2019, 11:21 AM Mackenzie Young <notifications@github.com wrote:

Rubric Score Criteria 1: Valid Python Code

Score Level: 4 (Exceeds expectations)

Comment(s): Great job, your code runs without any errors.

Criteria 2: Exploration of Data

Score Level: 3 (Meets expectations)

Comment(s): Very good start exploring your data. Good job looking at the relationships between the variables you were interested in. Your pairs plot was difficult to read, so it would have been much better if you had isolated the relevant row instead of displaying all of the extra irrelevant information. Very good discussion of your data exploration findings. Ideally you would have used the data exploration stage to look for a more promising relationship between variables, but it's okay that you went forward with your research question without having any evidence that your machine learning models would be able to predict the outcomes.

Criteria 3: Machine Learning Techniques used correctly

Score Level: 3 (Meets expectations)

Comment(s): In general, you did a very good job using machine learning techniques. Very nice job interpreting your results, and focusing on performance measures such as F1 and R^2 (most people just look at accuracy). Given you have a clear understanding of the models and how to interpret the results, it would have been MUCH better if you had chosen another research question that was well-suited for regression. As you mentioned, regression models should not be used to predict categorical outcomes. Also, good job acknowledging that your data was imbalanced and the effect that that had on your models' performance. It would be worth looking into different ways to deal with imbalanced data, like resampling ( https://www.kaggle.com/rafjaa/resampling-strategies-for-imbalanced-datasets ).

Criteria 4: Report: Are conclusions clear and supported by data?

Score Level: 4 (Exceeds expectations)

Comment(s): You did an excellent job presenting your research question, comparing your machine learning algorithms, and citing your results as evidence for your conclusions.

Criteria 5: Code formatting

Score Level: 4 (Exceeds expectations)

Comment(s): Great job formatting your code in Jupyter Notebooks.

Overall Score: 18/20

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Al1952/date-a-scientist/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AHbUpNfLUNoF6K9T-MxSKiLxjAO3YSLWks5vEeaTgaJpZM4aIGso .

Al1952 / date-a-scientist

MLF Capstone Feedback #1