Thanks! - Githubissues

coderschoolreview commented 6 years ago

The goal of this assignment was to introduce you to 2 main concepts in Machine Learning: Data Pre-processing, and Classification. You learned how to query and clean data using the pandas library in Python, and built a simple Machine Learning Classifier based on the K Nearest Neighbors algorithm.

Things you did well:

Discovered the usage of many new functions! unique, sort, sample, value_counts, dropna. Great job!
Excellent use of comments! Very clearly laid our your steps and your thought process. Shows that you understand the material and are being very methodical about it. Bonus is that it's a pleasure to read your assignment!

One minor tip:

You don't necessarily need to iterate through an array to print it, i.e.

for i in view_genres: print(i)

can be substituted with just view_genres or view_genres.tolist() etc.

Overall, excellent work! You are demonstrating that you are understanding the material and doing a great job of applying it. Keep it up!

coderschoolreview commented 6 years ago

Assignment 2

The goal of this assignment was to introduce you to three new classification techniques and to understand how to select the best parameters and features for them. You learned how to use python built-in functions (GridSearchCV, SelectKBest, RFE, SelectFromModel) to try out new models (Support Vector Machines, Random Forests, and Logistic Regression) and test different permutations of parameter values and features, and analyze your results to help build better machine learning models.

Great job! Given that you don't have a programming background, your progress and work is really quite exceptional.

Here's what you did really well:

Completed all the required User Stories in a comprehensive and structured manner.
Loving the comments at every step. Your work and thought process is really clear and makes your assignment a pleasure to grade.
Great discussion and analysis about your results.

Some suggestions:

Looks like you got the following error: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge You can try fixing this by increasing the max_iter parameter in GridSearchCV (the default is 100).
Because you're doing the standard User Stories so well, I would really encourage you to try at least one or two of the bonus stories next time. Try and start a bit earlier so that you have time to do so. You have a lot of potential and you will really benefit from giving them a shot!
You mentioned you were interested in reading more about some of the math. Here's a recommended reading.

Overall, amazing work. Keep it up !!!

coderschoolreview commented 6 years ago

Assignment 3

The goal of this assignment was to introduce you to three new Natural Language Processing techniques, and to understand how to perform some basic sentiment analysis on song lyrics using these methods. You learned how to clean and prepare textual information for NLP, and then apply the following approaches: Bag Of Words, TF-IDF, and Doc2Vec. You used your prior knowledge of Python estimators, feature selection, and parameter optimization techniques to produce feature vectors from these NLP methods to make predictions on the moods of songs using their lyrics.

As usual -- great job with the assignment! You always present your work in a clear, structured way showing your thought process and exploring lots of different options and combinations of classifiers, parameters, and feature optimization techniques.

Here are a couple of notes:

Sometimes you notice that your GridSearchCV best estimator returns a score lower than what you seemed to have gotten before GridSearchCV. This might be because your initial score was done on a single train_test_split, whereas GridSearchCV actually uses cross-validation internally! So the 2 scores can't really be compared. Instead, try finding the mean cross-validated score before GridSearchCV, then get the best estimator, and then find the mean cross-validated score of the best estimator. This should give you a more accurate comparison.
RFE works via recursion (you can learn more about it here), and it can generally take a long time — trying 5,737 features for an RFE is an exorbitant amount! This is because in RFE, it re-runs the fit and predict steps every time, re-evaluating feature importances, re-trimming down the features, and repeating the process all over again in every cycle. So be careful. In such cases, it will be a better idea to use SelectKBest or SelectFromModel as these are simple iterative processes and will take much less time (as you found out!).

Overall, great work!

LauloMC / CoderSchool

Thanks! #1