Closed GoogleCodeExporter closed 8 years ago
i think you could consider the voting of trees in the ensemble as the
confidence measure. like if most of the trees in the ensemble agree on the
label, it can be considered high confidence of the ensemble for the label.
look at the ode to obtain prediction per tree.
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/tuto
rial_ClassRF.m#230
this variable is explained here
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/clas
sRF_predict.m#22
If there are 500 trees and 10 examples you will have a 10x500 matrix. You can
find the number of labels for each class (per example) and just divide by the
number of trees to get a probability or confidence measure.
do tell me if you have more questions.
Original comment by abhirana
on 8 Oct 2010 at 3:03
That was very clear and what I exactly wanted. Thanks.
Original comment by gayumahalingam
on 8 Oct 2010 at 1:11
Hi, before anything, many thanks for the code. Came across with it a couple of
days ago and it looks great.
May I ask you how can I calculate the level of confidence (let's say the
standard deviation) of my test set when applying RF regression? (For example,
the simplest problem: I have vectors xtrain and ytrain (1 feature only), and
predict ytest given xtest)
Thanks
Original comment by marco.an...@gmail.com
on 26 Jan 2012 at 12:53
hi
one way to look at confidence interval is to look at the output of each tree
for each test example and that might serve as a confidence interval for that
particular test example (maybe a histogram to examine the exact distribution).
note that when you pass a test example to a tree, it will end up at one of the
leaf node. the output of each tree for a given test example is the avg of the
y from the training examples for regression (for classification its more like
that majority vote).
a more involved manner would be considering quantile regression forest
http://jmlr.csail.mit.edu/papers/volume7/meinshausen06a/meinshausen06a.pdf
in a nutshell: you can decompose the output of the trees as a linear weighted
model of the training examples. note that for a regression tree you end up at a
leaf node and the avg of the y values (from training) at that leaf node is the
output of the tree. taking that same with all the trees in the forest you can
represent each test example as a weighted avg of the training examples and in a
way describe any test example in terms of the training examples. i dont have
this implemented, it requires getting what training examples land at each leaf
node in the tree.
Original comment by abhirana
on 26 Jan 2012 at 2:31
Thank you for this. I'll have a look in quantile regression forests.
Original comment by marco.an...@gmail.com
on 26 Jan 2012 at 10:45
Original comment by abhirana
on 8 Apr 2012 at 11:57
Hi.. may i know how to calculate the probability of my test data.
x_trn=[A B];
y_trn=a vector of zeros and ones.
output that i get form classifer is ones and zeros,but what i need is just
probabilites . any help here ??
Original comment by sunupete...@gmail.com
on 4 Nov 2014 at 10:46
and iam also missing some of the fields in 'model'..i dont have errtr and
xbestsplit,so iam not able to perform the operation.why it is so?
Original comment by sunupete...@gmail.com
on 4 Nov 2014 at 12:21
Original issue reported on code.google.com by
gayumahalingam
on 7 Oct 2010 at 2:17