Module 4 Feedback - Githubissues

hfboyce commented 4 years ago

Here she is! This was a bit more time consuming for me but hopefully it's ok.

https://intro-machine-learning.netlify.app/en/module4

Please be as thorough as you like. Missing notes in the transcript are intensional so we can fill it in with what you end up talking about.

I'll be working on the assignment all day tomorrow so that should be on your way before Monday, which is when I will be starting Module 5 (hopefully on track).

(Also no need to rush, Elijah is still working on Assignment 3) This module should have 28 exercises.

hfboyce commented 4 years ago

This is @kvarada feedback that is coming from the other repo found here.

Hi @hfboyce I see that you have put a lot of work in this module!

Here are my comments for improvement:

0

[x] 0.2 Describe the curse of dimensionality. --> How about Explain the problem of curse of dimensionality?
[x] 0.2 Compare and contract 𝑘-NNs and SVM RBFs. --> Compare and contrast 𝑘-NNs and SVM RBFs.
[x] 0.3 Like that you have added pictures :).

2

[x] Question: How many dimensions would the dataset have --> How about replacing it with what would be the dimension of feature vectors in this problem?

3

[x] Question 3: A dataset with 50 dimensions is considered low dimensional. --> Not letting me select the answer. Also, I would either get rid of this question or change 50 to something like 10, where most people would agree on the answer.

4

[x] 4.8 What are x_1, x_0 and y_1 and y_0? It might be helpful for the students to clearly define them.
[x] 4.8 I like that you are making this connection. But I worry that it might take a bit long to actually explain this and not sure if it's necessary.

5

[x] Question 1: I would change this to: What is the Euclidean distance between the following two feature vectors? And provide different numbers for distances as options instead of equations.
[x] Question 2: Oops you are already doing that. I would remove the question with equations. Also, how about adding some negative values in the feature vectors?
Perhaps you could give them three vectors u, v, w, and ask two vectors are more similar: u and v or u and w?

6

[x] Distance will always have a positive value -- > Euclidean distance will always have a positive value.

7

[x] Calculating Euclidean distances by "Hand" --> Probably I would call it step by step instead of by hand because they are actually coding it up. In general, this exercise is fine but not sure if we need this after the first few exercises.
[x] Subtract the two first pokemon feature vectors --> Subtract the first two pokemon feature vectors

8

[x] Do we show them anywhere how to calculate Euclidean distance using scikit-learn?

9

[x] 9.7 nn.kneighbors([[-80, 25]]) --> Nice!

10

[x] Question 1: I think there is a bit more complexity here. Initially we were trying to predict all distances between all points in the training data and that's why we set diagonal = infinity. This is not the usual scenario though. In a usual scenario we are given a query point, which is not from the training data. How about changing the question to: In the slides we calculated distances between all points in the training data using sklearn's euclidean_distances function. What would happen if we didn't use fill_diagonal()?

11

[x] Question 1: Makes sense to put this after introducing the k-NN classifier.
[ ] Question 2: Finding the distances to a query point takes double the time as finding the nearest neighbour. Not sure if I understand this question.

12

[x] The query point, Snoodle, shouldn't have the target value in it. So it should be: [[53, 77, 43, 69, 80, 57, 379]]

13

[x] 13.2 Minor suggestion: How about using a different colour for gray? Something that stands out?

15

[x] Watch out for consistency with $k$-NN abbreviation: $k$-NN vs K-NN vs KNN

16

[x] How about rounding off the scores? print("The training score is %.4f" %(train_score))

17

[x] 17.9 I wouldn't make it so simple as dimensions go up score goes down because if that were the case we would just go with 1 feature only. Many ML algorithms have trouble finding meaningful patterns with a large number of dimensions.

18

[x] I find this question confusing. I don't think I would choose any of the provided answers. Not sure why the correct answer is 29 here. I would either ask something like at what of k there is a largest gap between the train and validation score?

Up to which value of n_neighbors is there overfitting?

20

[x] Question 1: To which depth would you set your n_neighbors hyperparameter? --> What value would you pick for the hyperparameter n_neighbors?
[x] Question 1: Let’s build a new model and let this hyperparameter value. --> Let’s build a new model with this hyperparameter value.

22

[x] Question 1: Multiple problems: You are asking prediction for k=1. So the prediction should be the target of [2 2], which is 1. But when you select it, it shows the error: Incorrect. The points (2, 2), (5, 2) and (4, 3) are the closest to (0, 0) and so we must take the average of all the values. You got it! We must take the average of the 3 nearest examples. If you want to use k=3, the average would be 0.333, which is not in the options.

25

[x] Testing your RBF knowledge --> Testing your SVM RBF knowledge
[x] Is it possible to get a bit better plots here. I guess you want to show that SVM decision boundaries are smoother, right?

26

[x] Typo: True “correct=

27

[x] I think the bad results are due to scaling and both k-NN and SVM RBF should suffer because of that.

Huh! Long module! Thanks for putting in all this work.

Will look at the assignment either tomorrow or Thursday if that's OK.

hfboyce commented 4 years ago

4.8 What are x_1, x_0 and y_1 and y_0? It might be helpful for the students to clearly define them. 4.8 I like that you are making this connection. But I worry that it might take a bit long to actually explain this and not sure if it's necessary.

Should I keep in or remove?

7 Calculating Euclidean distances by "Hand" --> Probably I would call it step by step instead of by hand because they are actually coding it up. In general, this exercise is fine but not sure if we need this after the first few exercises.

Do you think it's ok if I keep it?

8 Do we show them anywhere how to calculate Euclidean distance using scikit-learn?

Yes! In deck 4 slide 7.

11.2 Finding the distances to a query point takes double the time as finding the nearest neighbour. Not sure if I understand this question.

I've changed it to Calculating the distances between an example and a query point takes twice as long as calculating the distances between two examples.

13.2 Minor suggestion: How about using a different colour for gray? Something that stands out?

These are images I stole from Mike so I don't have the source code to change them unfortunately.

Is it possible to get a bit better plots here. I guess you want to show that SVM decision boundaries are smoother, right?

I'll add this on the wish list?

I think the bad results are due to scaling and both k-NN and SVM RBF should suffer because of that.

Should I make any changes to it?

kvarada commented 4 years ago

@hfboyce

4.8: I would remove it.
7: It's fine to keep it I guess.
8: Not sure how I missed it.
11.2 I've changed it to Calculating the distances between an example and a query point takes twice as long as calculating the distances between two examples. --> I still don't understand this. Why would it take more time? (Probably I am too tired right now.)
RE images. It's not that important. It's fine as is.
Adding 25 on wish list sounds fine
I guess we could mention that bad results are due to scaling and this can lead to the material in the next module.

UBC-MDS / introduction-machine-learning

Module 4 Feedback #24

0

2

3

4

5

6

7

8

9

10

11

12

13

15

16

17

18

20

22

25

26

27