Open alexyarosh opened 4 years ago
Hi Alex,
I've been working on the components for this Live Training as much as time has allowed and want it to be really high quality. Unfortunately, I've not been feeling well the last couple of days and so haven't been able to apply your feedback to the degree that I feel it needs. I'm so sorry, but I really feel like it's best to postpone this training in order to give the student learners the best experience possible.
Also, the links in the slide deck template are super helpful as a guide, but I cannot seem to find the Live Trainings that go along with them so that I get a better idea of how other instructors are handling going back and forth between showing slides and the notebook during the session. I'm sure Adel's Cleaning Data in Python is fantastic, but no matter how I search for it cannot locate the Live Training that he did. I only see the actual course. I thought that Kelsey stated that, "All live courses can be found here: https://datacamp.com/courses"? Can you please point me in the right direction?
Thank you so much and I sincerely apologize for any inconvenience. However, I really and truly want to give the preparation for this Live Training as much time as it needs and I just don't think that I can between now and this Thursday given how crappy I'm feeling at the moment.
Warmest Regards, Lisa Stuart MIT Certified Professional Data Scientist lisa5678@uw.edu 206-399-2681
On Fri, Jul 10, 2020 at 3:47 PM Alex Yarosh notifications@github.com wrote:
Assigned #3 https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/issues/3 to @LisaStuart5678 https://github.com/LisaStuart5678.
— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/datacamp/Applied-Machine-Learning-Ensemble-Modeling-live-training/issues/3#event-3535670252, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACACGR2RT3CZUQK567ZVWCLR26KZDANCNFSM4OW7SQ5Q .
Hi @LisaStuart5678 , I'm very sorry to hear that you aren't feeling well! It's no inconvenience at all. I believe you just received an email from Kelsey about rescheduling the training, please let me know if you have any questions!
Here are a few links to our previous live trainings. They contain the student and solution notebooks, and the recording of the session.
https://www.datacamp.com/resources/webinars/live-training-cleaning-data-in-python https://www.datacamp.com/resources/webinars/machine-learning-with-scikit-learn https://www.datacamp.com/resources/webinars/brand-analysis-using-social-media-data-in-r https://www.datacamp.com/resources/webinars/time-series-analysis-in-python https://www.datacamp.com/resources/webinars/machine-learning-with-xgboost
Please let me know if I can help with anything, and I hope you get better soon!
Hi @LisaStuart5678 !
Thank you for really exceptional work!
I know it might seem like giving too much detail, but this will be very invaluable for those students who work through the notebook on their own, or for students to refer to later when they are using their new skills in their own projects!
We have some general requests before the session:
As you prepare for the session...
I noticed that the models in the notebook don't give a better performance, and I realize that it might be too late to change that. Using a different example (or another example, in addition to already existing) would be ideal, but If finding a dataset/model that gives better performance is not possible, I suggest at least going into more detail about possible model improvements. The "Final observation" section is a good start, but I think the learners might find it a bit unsatisfactory. Of course, using
General comments on the notebook
I fixed a few formatting issues and typos, and saved the notebook as
Applied_Machine_Learning_Ensemble_modelling-solution.ipynb
. Please use that file from now on to make changes.This will help you plan the session better and avoid going too long without Q&A. This also helps students because they will see that they will have their questions answered soon. Please feel free to look at examples in notebooks for our past trainings!
Just a couple of sentences would be enough. A good place to do this is either in the very beginning of the session, or before "Getting started with Stacking Classifier", For example, the first video in Ch4 of in Ensemble Models in Python provides "intuition" behind stacking models. I think students would really love to hear your take on this! In addition to a brief explanation, this would be a good place to include a visualization. You have a great visualization in the "Double stacking" section. I'd love to see something similar for one-layer model! E.g. here's a picture from the video I referenced:
[ ] This one is up to you, but: most of our courses don't import functions like
mean
andstd
separately fromnp
. I recommend just usingnp.mean
,np.std
because students are much more used to this.[ ] Make sure all the inline code is formatted as
code
using backticks.This includes parameter names, possible parameter values, etc. I think I got most of these when I was going through the notebook, but double-check!
Specific comments
Stacking classifier
When creating X and y...
It's never mentioned explicitly which variable of the dataset we're going to predict
In "Creating a Naive Classifier"...
In "Custom function # 1: get_stacking()":...
I think this could really be subsumed by the first general: if you include a brief overview of principles of stacking, this will be clear by the time we get to writing this custom function!
In "Custom function # 3: evaluate_model(model):"...
[ ]
RepeatedKFold
is not taught in any of our courses, so most people will be unfamiliar with it. Could you provide a brief explanation/exposition, like you did with other functions used in the training?[ ] _Same note about using
'neg_mean_absolute_error'
:_ we don't teach it anywhere, so students will be unfamiliar with this error type.l I suggest including a brief explanation on why we're using this particular error type and what it represents.In "Evaluate the models and store results"
Stacking regressor
The same comments as above
[ ] Students are likely unfamiliar with SVR: it isn't taught in e.g. our scikit-learn course, and it's only briefly mentioned in your course. I suggest you include a few words about what is SVR, how does it work, etc. Doesn't have to be detailed -- 1-3 sentences and maybe a link to more information.
Questions as a learner
Finally, below are some questions that I personally as a student in this training would have. Whether you'd like to address them in the notebook, or verbally when conducting the training, or at all, is totally up to you!
Why we're using
float32
and not, say,float64
(which is the type of two existing columns after reading in the dataset)After training, say, a baseline classifier, we get accuracy 77%. Of course we want to do better (that's why we're in this training), but in general, is this considered good or bad accuracy?
In "Custom function # 1: get_stacking()", how did we pick the exact classifiers to include? Why these and not others?
Thank you again for excellent work! Please let me know if you have any questions!