Closed hfboyce closed 3 years ago
@kvarada I'll leave this to you, but let me know if there's anything you want me to look at.
Sounds good. I'll get to this in the afternoon.
Great job with the module @hfboyce ! I see a bit of "Hayley touch" :).
Here are my comments for improvement for module2.
sklearn
doesn't support categorical variables in decision trees, decision trees are capable of handling them. For example you could have a categorical variable "weather" with three possible values (e.g., warm, hot, and cold) and then there would be 3 children to a node and it would represent if/elif/else statement. What do you think @mgelbart? max_depth
later in the module. display_tree
function to get rid of all the unnecessary stuff from graphviz
trees. You can find it here.DummyClassifier
instead of baseline models. fit
and predict
here. So something like: We need to fit
our decision tree model before calling predict
. predicted
. altair
plots!! tree_score
or not. Also, do we need to call predict here if we are doing score
? It could be my English problem but I find "and then predict on the target column y" a bit confusing. For me, we are predicting on X
to get hat{y}Great work overall!
Btw, it took me ~2 hours. Probably because it was my first time (and I am also a bit slow in general :)).
Btw, it took me ~2 hours. Probably because it was my first time (and I am also a bit slow in general :)).
Yeah I wasn't tracking my hours back then but I think it probably took me ~2 hours as well at the beginning, but then I got faster.
I suggest changing both questions. Although sklearn doesn't support categorical variables in decision trees, decision trees are capable of handling them. For example you could have a categorical variable "weather" with three possible values (e.g., warm, hot, and cold) and then there would be 3 children to a node and it would represent if/elif/else statement. What do you think @mgelbart?
My take is that you'd always have 2 children but you'd have split rules like "weather == warm?" rather than having to one-hot encode weather, which is a limitation of scikit-learn. I guess there's this new CatBoost which handles categorical variables directly with some sort of gradient boosted trees. So personally I'm OK with the second question here. For the first question, it's a bit unclear to me whether children means direct children or all descendants. I agree with @kvarada on removing the first one and I'm fine either keeping or deleting the second one.
Oh I see what you mean. I was thinking about pictures such as this, which are not uncommon to represent decision trees.
Yeah makes sense. I guess under the hood they would probably be two separate splits, one for == Sunny and then, on the False branch, another for == Overcast. But I agree it's getting into technicalities which are besides the point of what we're teaching here.
So take both out or just the first one? I'll can replace them with something else.
I think the second one is keep-able if we remove the 2nd option (if/elif/else), to reduce confusion.
@hfboyce I used Module 2 exercises in my class activities today. Seems like the students enjoyed them :tada:. A couple of things came up:
@kvarada
Many of them were not able to run code exercises; they were getting binder hub error.
This may have something to do either with their browser, network or where they are located geographically. Do you know what browser they were using or where they are living?
The autograding for 7.3 is probably wrong?
Yes Elijah has yet to make the tests for Module 2 so this autograding is 100% wrong.
In 11 we are showing them two trees: one with class in each node and other without class. It was confusing for the students.
I've updated this! Thank you!
Please see the deployed link here. @kvarada it's important that we are both seeing the same version as sometimes your browser will not update and it can be very frustrating. Make sure to clear to cookies for this site from now on. This module should have 21 "exercises" (this is what we call each of these numbered sections).
The latest change was in Exercises 21 the learning outcomes should NOT include the bullet: Split a dataset into train and test sets using train_test_split() function.
Please feel free to point out everything you can so we can improve this and produce the best product. Thanks :)