some questions - Githubissues

GoogleCodeExporter commented 9 years ago

Hi,

  First of all many thanks for this wonderful code. I have some questions.

1) Is it possible to set the depth of each tree in random forest?
2) Each node in the tree should have one of the two node status values(1,-1) 
non terminal and terminal. But when I run tutorial_ClassRF.m I find a lot of 
nodes with a nodestatus of zero. What does this zero mean.?

Kindly guide me.

Original issue reported on code.google.com by umer.r...@gmail.com on 23 Oct 2012 at 11:19

GoogleCodeExporter commented 9 years ago

for 1) yup, change the nodesize to change the depth of each tree. a larger 
nodesize will create smaller trees.

for 2) consider that a binary tree has 2^m possible nodes when the depth of the 
tree is m. most data are not complicated enough that you will mostly never end 
up with 2^m (an recursive xor dataset will get you a 2^m node tree, btw) so 
many of the nodes are terminal and maynot even get to the 2^m depth. But the 
way the tree is stored is by assuming that one is storing the probably 2^m 
nodes (but fear not its incremented in a way that 2^m nodes are not really 
stored but a much smaller fraction). The best way to get the tree structure is 
to start at the root node, and get the child node and from that child node get 
to the child of child node and so on. This information though is saved as a 
contiguous vector and the zero means that the node is never created.

if you trying to understand how the tree structure is stored, maybe this will 
help you http://code.google.com/p/randomforest-matlab/issues/detail?id=18&can=1

hope this helps

Original comment by abhirana on 24 Oct 2012 at 2:15

GoogleCodeExporter commented 9 years ago

Hi, I am new to random forest and I do not understand that a larger nodesize 
will create smaller trees. Wouldn't a larger nodesize will result in larger 
depth (~log2(nodesize)) in trees, and therefore create bigger trees? 

Thanks in advance.

Original comment by hyo...@cs.unc.edu on 1 Aug 2014 at 9:31

GoogleCodeExporter commented 9 years ago

hello @hyojin

i think you are confusing number of nodes in a tree with nodesize. 

nodesize = when a node has nodesize or less examples then the splitting stops. 
let's say you are at the root node and let's say the k-th feature divides the 
data at value v;  examples that have value of k feature < v will fall in the 
left node and examples with value k>=v will fall in the right node. this 
splitting is recursively done till the number of nodes falling into a node are 
less than or equal to nodesize. then the tree won't be further grown from that 
node.

so a larger nodesize will create short trees and smaller nodesize will create 
tall trees. note that this depth also depends on the type of tree; 
classification trees will stop growing in a single level for linearly divisible 
data.

Original comment by abhirana on 3 Aug 2014 at 12:18

GoogleCodeExporter commented 9 years ago

Oh I see. Thanks. So nrnodes is the number of nodes, right? If I have trees 
with nrnodes = 8001, then my tree would have maximum depth of log2(8001)?

Original comment by hyo...@cs.unc.edu on 4 Aug 2014 at 2:45

GoogleCodeExporter commented 9 years ago

yup thats somewhat correct. note that unlike a purely binary tree, random 
forest trees may be unbalanced and some branches may be longer than others, 
some branches are terminated near the root node. so the right answer would be a 
depth of O(log2(nrnodes))

Original comment by abhirana on 9 Aug 2014 at 4:19

etrigger / randomforest-matlab

some questions #47