alexyarosh commented 4 years ago

Hi @LisaStuart5678 !

Thank you for really exceptional work!

I know it might seem like giving too much detail, but this will be very invaluable for those students who work through the notebook on their own, or for students to refer to later when they are using their new skills in their own projects!

We have some general requests before the session:

As you prepare for the session...

As you prepare the training and thinking about what you are going to say, make sure to cover most (if not all) of the things that we promised the learners to cover. In our case, this is
The Data Scientist mindset and keys to success in transitioning from baseline models to stacking models.
How to select a baseline Machine Learning algorithm
Discuss alternative stacking methods
Create simple, two-layer regressor and classifier stacked models
How to tune hyperparameters using K-fold cross-validation
```
and
```

you'll learn how to create a layer of baseline models, and using packages designed for model stacking, 
another layer to produce a final model with much better-than-baseline performance

I noticed that the models in the notebook don't give a better performance, and I realize that it might be too late to change that. Using a different example (or another example, in addition to already existing) would be ideal, but If finding a dataset/model that gives better performance is not possible, I suggest at least going into more detail about possible model improvements. The "Final observation" section is a good start, but I think the learners might find it a bit unsatisfactory. Of course, using

We suggest that our instructors have a "practice run" for the live training on their own: set aside about 3 hours of time, and just "talk" through the notebook as if this was a real life session. Make sure to allot about 10 mins for each Q&A. The ideal length of the session is 2.5-3 hours.

General comments on the notebook

I fixed a few formatting issues and typos, and saved the notebook as Applied_Machine_Learning_Ensemble_modelling-solution.ipynb. Please use that file from now on to make changes.

[ ] We usually suggest including text cells that just say "Q&A" in advance.

This will help you plan the session better and avoid going too long without Q&A. This also helps students because they will see that they will have their questions answered soon. Please feel free to look at examples in notebooks for our past trainings!

[ ] Include a brief section with some exposition/reminder about the idea behind stacking models in general.

Just a couple of sentences would be enough. A good place to do this is either in the very beginning of the session, or before "Getting started with Stacking Classifier", For example, the first video in Ch4 of in Ensemble Models in Python provides "intuition" behind stacking models. I think students would really love to hear your take on this! In addition to a brief explanation, this would be a good place to include a visualization. You have a great visualization in the "Double stacking" section. I'd love to see something similar for one-layer model! E.g. here's a picture from the video I referenced:

[ ] This one is up to you, but: most of our courses don't import functions like mean and std separately from np. I recommend just using np.mean, np.std because students are much more used to this.
[ ] Make sure all the inline code is formatted as code using backticks.

This includes parameter names, possible parameter values, etc. I think I got most of these when I was going through the notebook, but double-check!

Specific comments

Stacking classifier

When creating X and y...

[ ] Specify somewhere what is the response variable and the explanatory variables -- either in the dataset description or right before splitting the dataset, or both.

It's never mentioned explicitly which variable of the dataset we're going to predict

In "Creating a Naive Classifier"...

[ ] After the first sentence, I would add something like "We'll use the most frequent class as the prediction"

In "Custom function # 1: get_stacking()":...

[ ] Explain what meta classifier/layer is

I think this could really be subsumed by the first general: if you include a brief overview of principles of stacking, this will be clear by the time we get to writing this custom function!

In "Custom function # 3: evaluate_model(model):"...

[ ] RepeatedKFold is not taught in any of our courses, so most people will be unfamiliar with it. Could you provide a brief explanation/exposition, like you did with other functions used in the training?
[ ] _Same note about using 'neg_mean_absolute_error':_ we don't teach it anywhere, so students will be unfamiliar with this error type.l I suggest including a brief explanation on why we're using this particular error type and what it represents.

In "Evaluate the models and store results"

[ ] Add a title to the plot

Stacking regressor

The same comments as above
- [ ] Explicitly state which variable we're predicting
- [ ] Explicitly state what exact value (the median) is used for the naive regressor (this can be done verbally)
- [ ] Add a title to the plot
[ ] Students are likely unfamiliar with SVR: it isn't taught in e.g. our scikit-learn course, and it's only briefly mentioned in your course. I suggest you include a few words about what is SVR, how does it work, etc. Doesn't have to be detailed -- 1-3 sentences and maybe a link to more information.

Questions as a learner

Finally, below are some questions that I personally as a student in this training would have. Whether you'd like to address them in the notebook, or verbally when conducting the training, or at all, is totally up to you!

Why we're using float32 and not, say, float64 (which is the type of two existing columns after reading in the dataset)
After training, say, a baseline classifier, we get accuracy 77%. Of course we want to do better (that's why we're in this training), but in general, is this considered good or bad accuracy?
In "Custom function # 1: get_stacking()", how did we pick the exact classifiers to include? Why these and not others?

Thank you again for excellent work! Please let me know if you have any questions!

LisaStuart5678 commented 4 years ago

Hi Alex,

I've been working on the components for this Live Training as much as time has allowed and want it to be really high quality. Unfortunately, I've not been feeling well the last couple of days and so haven't been able to apply your feedback to the degree that I feel it needs. I'm so sorry, but I really feel like it's best to postpone this training in order to give the student learners the best experience possible.

Also, the links in the slide deck template are super helpful as a guide, but I cannot seem to find the Live Trainings that go along with them so that I get a better idea of how other instructors are handling going back and forth between showing slides and the notebook during the session. I'm sure Adel's Cleaning Data in Python is fantastic, but no matter how I search for it cannot locate the Live Training that he did. I only see the actual course. I thought that Kelsey stated that, "All live courses can be found here: https://datacamp.com/courses"? Can you please point me in the right direction?

Thank you so much and I sincerely apologize for any inconvenience. However, I really and truly want to give the preparation for this Live Training as much time as it needs and I just don't think that I can between now and this Thursday given how crappy I'm feeling at the moment.

Warmest Regards, Lisa Stuart MIT Certified Professional Data Scientist lisa5678@uw.edu 206-399-2681

On Fri, Jul 10, 2020 at 3:47 PM Alex Yarosh notifications@github.com wrote:

Assigned #3 https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/issues/3 to @LisaStuart5678 https://github.com/LisaStuart5678.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/issues/3#event-3535670252, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACACGR2RT3CZUQK567ZVWCLR26KZDANCNFSM4OW7SQ5Q .

alexyarosh commented 4 years ago

Hi @LisaStuart5678 , I'm very sorry to hear that you aren't feeling well! It's no inconvenience at all. I believe you just received an email from Kelsey about rescheduling the training, please let me know if you have any questions!

Here are a few links to our previous live trainings. They contain the student and solution notebooks, and the recording of the session.

https://www.datacamp.com/resources/webinars/live-training-cleaning-data-in-python https://www.datacamp.com/resources/webinars/machine-learning-with-scikit-learn https://www.datacamp.com/resources/webinars/brand-analysis-using-social-media-data-in-r https://www.datacamp.com/resources/webinars/time-series-analysis-in-python https://www.datacamp.com/resources/webinars/machine-learning-with-xgboost

Please let me know if I can help with anything, and I hope you get better soon!

datacamp / Applied-Machine-Learning-Ensemble-Modeling-live-training

Notebook review #3