Module 3 Feedback - Githubissues

hfboyce commented 4 years ago

Hi @mgelbart! Ok buckle up for another round of module feedback(Also known as finding all of Hayley's grammar mistakes 🤦‍♀️- sorry sorry)!

Link here -> https://intro-machine-learning.netlify.app/en/module3

There should be 22 exercises. The most recent change was the additional Q 16( This should have an image of a decision tree in it and the question asks if it's more likely underfitting or overfitting)

I am doing the amendments to Assignment 2 today and so I should have Assignment 3 for you in the next 2 days (possibly Friday EOD)

Hope all is going well with juggling both your courses!

mgelbart commented 4 years ago

@kvarada @hfboyce I have a general comment here that will affect the whole course: I'm proposing we do everything in terms of scores and not introduce the notion of error (1-score) at all. I think that will be easier to understand and will make the code cleaner. What do you think? I realize it will take a bit of work for @hfboyce to redo some of module 3, and not sure about module 2. So we should probably discuss this decision on Monday.

Another general question: did we decide not to teach overfitting on the validation set in this course? I was thinking about the slides on the Golden Rule. These two things kind of go together, don't they?

Module 3 comments:

[x] 1.2: I think we should avoid all mathematical notation here. We don't get much benefit from M or D etc. Can we just call them training error (score) and generalization error (score)?
[x] 1.12: Can you explain the drop syntax? I think I asked about this earlier. I would expect drop(columns=['country']).
[x] 1.15: Can we show a shallower version before showing the deep tree? This is a lot to take in at once.
[ ] 1.16: In CPSC 330 I basically abandoned the idea of error and just use scores for everything. Should we follow suit here? I think it's less confusing for learners.
[x] 1.17: Again, I would start by showing some simpler boundaries. I would put the boundary and the tree side-by-side on the same slide for a very simple tree (depth 1 or 2).
[x] 2.3: I don't think any of these answers are correct. I would add an answer, "To help us assess how well our model generalizes". It's true that tuning hyperparameters with cross-validation helps us make a model that generalizes better. But I'd rather we frame the test set as about assessment than model building.
[x] 3.1: Add "If you don't set random_state, "
[x] 4.1: The split itself doesn't have a performance -> change wording to "On which split does the decision tree perform better?"
[x] 5.2: Let's start by repeating the earlier diagram, and then having the next slide be this expanded diagram. This expanded diagram is a lot to take in all at once.
[x] 5.3: a bit too much text on the slide. Train: used to fit our models. Validation: used to assess our model during model tuning. Test: unseen data used for a final assessment.
[x] 5.5: remove the text about errors, I think it's maybe too much for them? We can add in the transcript to say that the scores will get lower as you go down the list.
[x] 6: I like these
[x] small thing: 7.1 has a question mark, 7.2 doesn't
[x] Do they know what a "hyperparameter" is yet? We've used the term a couple times now. I didn't look at Module 2 so I'm not sure, just checking.
[x] 8.6: before introducing cross_validate, I would show us taking the .mean() of the cross_val_score output, so that they can see us getting a single score. Maybe we can also point out that it's similar to the validation score we saw earlier.
[x] 8.10: 👍
[x] 9: to make these more approachable, maybe we should use an actual number of n and k, like n = 1000, k = 10, or something? Just want to avoid even the slightly scary math of n(k-1)/k
[x] 9.2: it says answer 4 is incorrect?
[x] 10.2: I said True here. I think it's true.
[x] 10.3: formatting with $k$. I don't think they have the knowledge to answer this because it's not explained. Maybe we should add this into the slides?
[x] 10.4: This is a bit tricky, let's remove it.
[x] 11: when we say "make sure to use random_state" maybe we should say it's for testing purposes, else they might think there is some ML reason to do this?;; let's name the variable cv_scores because the CV "score" would normally refer to the average of these sub-scores.
[x] 13.2: going back to my earlier comments, I suggest we remove the mathematical notation here, even if simple.
[x] 13.4: scores instead of errors
[x] For these transcripts, are they copied from my/Varada's notes? I hope you don't spend too much time on them because I'll probably change them when recording. I thought we were going to skip them and then transcribe them after the recording, that's why I'm asking.
[x] 13.6: this is cut off at the bottom for me
[ ] For 13 in general, I wonder if we should just focus on two errors, either train/valid or train/test. I think E_best might be a bit much here. I would say we should either take it out entirely or have it later. Maybe later in the course we might have a section on practical tips, and we can move it there, basically saying you never know if you could have a better model or not.
[x] 14.2: change to score (scores low)
[x] 15.1: I think is too advanced, 15.2 is overlapping with 14, 15.3 is fine but not sure if its worth keeping on its own
[x] 16: this is hard to answer without more information! I don't think this question really works - for some complex datasets this might still be underfitting.
[x] 17: not required, but it might be nice for them to inspect the dataset visually before number-crunching
[x] 18.2: see earlier comments
[x] 18.3: I'm pretty opinionated about this topic. I don't like this diagram unfortunately. To think about it this way one needs to think of the training data as a random sample from a probability distribution, which is too advanced here. I think we can simply not mention bias/variance in this course.
[x] 18.6: we can clean up the code a lot here. I don't think this is the right time to introduce the error bars (std). I do think it'd be awesome if we introduce these, but they should be in the cross-validation section rather than this section. ;; let's also flip these to use scores instead of errors. ;; the bottom of this plot is cut off for me
[x] 18.9: I think it would be more elegant to sort_values by the cv score and then take the top entry with iloc[0].
[x] 19.2: Ideally we wouldn't want the validation data to influence training either.
[x] 20.1: maybe too tricky for them? Maybe a multiple choice would be better? The fundamental tradeoff states that... and then have this as a wrong answer but also have a correct answer there.
[x] 20.2: remove
[x] 20.3: will they know this? i don't think they've seen an example of it. Again, maybe MCQ is better? "The training score is ___ higher than the validation score", options: always, usually, sometimes, never
[ ] 21: Can we expand this question so that they actually produce the plot with Altair (given some starter code we provide) ? And then there could be some follow-up questions if we want, including one about the test error maybe, though not required.;; Also the plot in 21 is kind of wonky, the cv error goes up and down and then back up. Maybe using more folds might help smooth it out?

It seems Module 3 is tough in every course 😳 . Well, we knew this would be the hardest one I think. It's the hardest to understand, the hardest to teach, and I'm pretty opinionated about it (sorry). We'll get through it 🚀

kvarada commented 4 years ago

I have a general comment here that will affect the whole course: I'm proposing we do everything in terms of scores and not introduce the notion of error (1-score) at all. I think that will be easier to understand and will make the code cleaner. What do you think? I realize it will take a bit of work for @hfboyce to redo some of module 3, and not sure about module 2. So we should probably discuss this decision on Monday.

Using scores instead of error everywhere sounds good to me. I did notice the inconsistency in my notes when I was recording. But then I didn't bother to change it. I guess we could talk about scores in general and mention once that you might hear people talking about error instead of scores, and in the context of classification, error is just 1 - accuracy score. Using scores instead of error also make these concept easier to understand for regression problems, as at this point they have only seen R2 score for regression problems.

hfboyce commented 4 years ago

1.15: Can we show a shallower version before showing the deep tree? This is a lot to take in at once.

Like just the first split ? I am confused. Do you want me to show a smaller tree or just part of this large tree. dept 2 on new slide

1.17: Again, I would start by showing some simpler boundaries. I would put the boundary and the tree side-by-side on the same slide for a very simple tree (depth 1 or 2).

So just redo this whole thing but with a tree of depth 2? but now I am confused because this may generalize better perhaps?

5.2: Let's start by repeating the earlier diagram, and then having the next slide be this expanded diagram. This expanded diagram is a lot to take in all at once.

Which diagram? just the train test split one?

Do they know what a "hyperparameter" is yet? We've used the term a couple times now. I didn't look at Module 2 so I'm not sure, just checking

Yes! I have a section in module 2 about them.

10.3: formatting with $k$. I don't think they have the knowledge to answer this because it's not explained. Maybe we should add this into the slides?

I was thinking this would be said in the part where we show the cross validate since the running time is there.

For these transcripts, are they copied from my/Varada's notes? I hope you don't spend too much time on them because I'll probably change them when recording. I thought we were going to skip them and then transcribe them after the recording, that's why I'm asking.

Mostly from Varada's notes. I do put some of my own in, just so you have an idea of where I was going with it.

For 13 in general, I wonder if we should just focus on two errors, either train/valid or train/test. I think E_best might be a bit much here. I would say we should either take it out entirely or have it later. Maybe later in the course we might have a section on practical tips, and we can move it there, basically saying you never know if you could have a better model or not.

Discuss in meeting.

18.6: we can clean up the code a lot here. I don't think this is the right time to introduce the error bars (std). I do think it'd be awesome if we introduce these, but they should be in the cross-validation section rather than this section. ;; let's also flip these to use scores instead of errors. ;; the bottom of this plot is cut off for me

exercise 8 just add slide std deviation and usually mean but look at scores and range.
8.9 fixed pd.dataframe

Can we expand this question so that they actually produce the plot with Altair (given some starter code we provide) ? And then there could be some follow-up questions if we want, including one about the test error maybe, though not required.;; Also the plot in 21 is kind of wonky, the cv error goes up and down and then back up. Maybe using more folds might help smooth it out?

Discuss quickly

UBC-MDS / introduction-machine-learning

Module 3 Feedback #22