bait509-ubc / BAIT509

BAIT509 - Business Applications of Machine Learning
https://bait509-ubc.github.io/BAIT509/
13 stars 47 forks source link

Assignment 2 is now open #15

Closed vincenzocoia closed 6 years ago

vincenzocoia commented 6 years ago

...and can be found here.

vincenzocoia commented 6 years ago

I received the following question via email:

I was working on Assignment 2, Question 7 from ISLR book. Should I use the complete dataset for mse calculation using random forest (similar to what you taught in class) or should I use partitioned data i.e. training and test set and then use random forest for prediction.

You can obtain test error however you'd like, but I recommend using the out-of-bag predictions. It tends to do well at estimating generalization error. Not only that, but it's easier to compute.

riowxm commented 6 years ago

Hi Vincenzo, Can you clarify what do you mean by "create a plot displaying the test error resulting from random forest on the dataset for ranges of mtry and ntrees"? Should we plot the MSE as the mtry changes and MSE as ntrees changes separately or you want us to plot the simultaneous changes of mtry and ntrees? Thank you!

menggeyu commented 6 years ago

Hi, Vincenzo,

For Question 7 in ISLR book:

  1. When I fit the model using randomForest function in R, do I need to separate the data into training and test set and specify 'subset =training set' in function or just use the whole data to get OOB prediction?

  2. If I fit the model using randomForest, I can get the MSE value from the model summary. Does this value represents the MSE for OOB? Or I need to calculate MSE by myself?

Thanks! Megan Yu

satsuma24 commented 6 years ago

Hello Vincenzo,

For question 1.2, are we required to plot the decision tree in R with made-up dataset or can we just simply give the structure without using R?

Best, Hao

kumar-srikanth commented 6 years ago

@riowxm I feel that plot would be ntree VS MSE with different lines for different mtry values. So if you are using 4 different values of mtry, your plot will have four different lines each showing how error changes with different values of ntree.

kumar-srikanth commented 6 years ago

@satsuma24 I had enquired about this that day with Vincenzo, He told that we can explain how the decision tree will split based on the X1 and X2 values in the 2D image leading to the decisions of different regions and not draw the tree. But both also works. Explain and draw.

fish117 commented 6 years ago

Hello, professor I found some different method of doing regression trees. However, it will produce different results for regression trees. Can we use different packages to do the assignment or we are only allowed to use the package that is taught in class? https://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/ Best regards, Iris Xu

vincenzocoia commented 6 years ago

@menggeyu I believe my second comment in this thread addresses your Q1? As for Q2, I don't know off the top of my head -- is it in the documentation? Should be there somewhere.

@satsuma24 No data set is required to form a decision tree -- the boundaries you see in the diagram are sufficient. As for how to draw it, it's up to you -- feel free to draw it on paper and submit a picture of it, or use software to draw it.

vincenzocoia commented 6 years ago

@fish117 Nice find! You can use whatever software you'd like to implement regression trees. They will probably be parameterized differently, but that's OK.

vincenzocoia commented 6 years ago

Solutions to Assignment 2 are now available on Connect.