Confidence Measure - Githubissues

GoogleCodeExporter commented 8 years ago

Hello there,
   First, thanks for the wonderful code for random forest. I would like to know about the functionality of calculating the confidence measure for each prediction. Can you tell me a way to measure the confidence of each prediction by the random forest. Does your code include this functionality. It will be great if you can let us know a way to do it. 
Thanks

Original issue reported on code.google.com by gayumahalingam on 7 Oct 2010 at 2:17

GoogleCodeExporter commented 8 years ago

i think you could consider the voting of trees in the ensemble as the 
confidence measure. like if most of the trees in the ensemble agree on the 
label, it can be considered high confidence of the ensemble for the label.

look at the ode to obtain prediction per tree.
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/tuto
rial_ClassRF.m#230

this variable is explained here
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/clas
sRF_predict.m#22

If there are 500 trees and 10 examples you will have a 10x500 matrix. You can 
find the number of labels for each class (per example) and just divide by the 
number of trees to get a probability or confidence measure.

do tell me if you have more questions.

Original comment by abhirana on 8 Oct 2010 at 3:03

GoogleCodeExporter commented 8 years ago

That was very clear and what I exactly wanted. Thanks.

Original comment by gayumahalingam on 8 Oct 2010 at 1:11

GoogleCodeExporter commented 8 years ago

Hi, before anything, many thanks for the code. Came across with it a couple of 
days ago and it looks great.
May I ask you how can I calculate the level of confidence (let's say the 
standard deviation) of my test set when applying RF regression? (For example, 
the simplest problem: I have vectors xtrain and ytrain (1 feature only), and 
predict ytest given xtest)

Thanks

Original comment by marco.an...@gmail.com on 26 Jan 2012 at 12:53

GoogleCodeExporter commented 8 years ago

hi

one way to look at confidence interval is to look at the output of each tree 
for each test example and that might serve as a confidence interval for that 
particular test example (maybe a histogram to examine the exact distribution). 
note that when you pass a test example to a tree, it will end up at one of the 
leaf node. the output of each tree for a given test example is the avg of the  
y from the training examples for regression (for classification its more like 
that majority vote). 

a more involved manner would be considering quantile regression forest 
http://jmlr.csail.mit.edu/papers/volume7/meinshausen06a/meinshausen06a.pdf
in a nutshell: you can decompose the output of the trees as a linear weighted 
model of the training examples. note that for a regression tree you end up at a 
leaf node and the avg of the y values (from training) at that leaf node is the 
output of the tree. taking that same with all the trees in the forest you can 
represent each test example as a weighted avg of the training examples and in a 
way describe any test example in terms of the training examples. i dont have 
this implemented, it requires getting what training examples land at each leaf 
node in the tree.

Original comment by abhirana on 26 Jan 2012 at 2:31

GoogleCodeExporter commented 8 years ago

Thank you for this. I'll have a look in quantile regression forests.

Original comment by marco.an...@gmail.com on 26 Jan 2012 at 10:45

GoogleCodeExporter commented 8 years ago

Original comment by abhirana on 8 Apr 2012 at 11:57

Changed state: Done

GoogleCodeExporter commented 8 years ago

Hi.. may i know how to calculate the probability of my test data.

x_trn=[A B];
y_trn=a vector of zeros and ones.

output that i get form classifer is ones and zeros,but what i need is just 
probabilites . any help here  ??

Original comment by sunupete...@gmail.com on 4 Nov 2014 at 10:46

GoogleCodeExporter commented 8 years ago

and iam also missing some of the fields in 'model'..i dont have errtr and 
xbestsplit,so iam not able to perform the operation.why it is so?

Original comment by sunupete...@gmail.com on 4 Nov 2014 at 12:21

dursunakay / randomforest-matlab

Confidence Measure #12