Closed vincenzocoia closed 6 years ago
I received the following question via email:
I was working on Assignment 2, Question 7 from ISLR book. Should I use the complete dataset for mse calculation using random forest (similar to what you taught in class) or should I use partitioned data i.e. training and test set and then use random forest for prediction.
You can obtain test error however you'd like, but I recommend using the out-of-bag predictions. It tends to do well at estimating generalization error. Not only that, but it's easier to compute.
Hi Vincenzo, Can you clarify what do you mean by "create a plot displaying the test error resulting from random forest on the dataset for ranges of mtry and ntrees"? Should we plot the MSE as the mtry changes and MSE as ntrees changes separately or you want us to plot the simultaneous changes of mtry and ntrees? Thank you!
Hi, Vincenzo,
For Question 7 in ISLR book:
When I fit the model using randomForest function in R, do I need to separate the data into training and test set and specify 'subset =training set' in function or just use the whole data to get OOB prediction?
If I fit the model using randomForest, I can get the MSE value from the model summary. Does this value represents the MSE for OOB? Or I need to calculate MSE by myself?
Thanks! Megan Yu
Hello Vincenzo,
For question 1.2, are we required to plot the decision tree in R with made-up dataset or can we just simply give the structure without using R?
Best, Hao
@riowxm I feel that plot would be ntree VS MSE with different lines for different mtry values. So if you are using 4 different values of mtry, your plot will have four different lines each showing how error changes with different values of ntree.
@satsuma24 I had enquired about this that day with Vincenzo, He told that we can explain how the decision tree will split based on the X1 and X2 values in the 2D image leading to the decisions of different regions and not draw the tree. But both also works. Explain and draw.
Hello, professor I found some different method of doing regression trees. However, it will produce different results for regression trees. Can we use different packages to do the assignment or we are only allowed to use the package that is taught in class? https://www.r-bloggers.com/a-brief-tour-of-the-trees-and-forests/ Best regards, Iris Xu
@menggeyu I believe my second comment in this thread addresses your Q1? As for Q2, I don't know off the top of my head -- is it in the documentation? Should be there somewhere.
@satsuma24 No data set is required to form a decision tree -- the boundaries you see in the diagram are sufficient. As for how to draw it, it's up to you -- feel free to draw it on paper and submit a picture of it, or use software to draw it.
@fish117 Nice find! You can use whatever software you'd like to implement regression trees. They will probably be parameterized differently, but that's OK.
Solutions to Assignment 2 are now available on Connect.
...and can be found here.