aloysius-lim / bigrf

Random forests for R for large data sets, optimized with parallel tree-growing and disk-based memory
91 stars 26 forks source link

extracting class probabilities #8

Closed bbalin12 closed 10 years ago

bbalin12 commented 10 years ago

How do you extract class probabilities for a bigRF forest? I don't see type='prob' as an option in the predict method.

aloysius-lim commented 10 years ago

When you call the predict() method, it returns an object of class bigcprediction. This class contains a slot called testvotes which contains the class votes for each test example (see documentation for bigcprediction-class. You can compute the class probabilities for each test example from this matrix.

bbalin12 commented 10 years ago

Thanks. I have the bigcprediction object, but how do I access the information inside it? I've tried:

testPredictions <- predict(bigForest, testData[,-c(1, 36, 149:150)],factor(testData$y),  printerrfreq = 1, cachepath = 'etc/etc', trace = 1)

testPredictions$testvotes
Error in testPredictions$testvotes : $ operator is invalid for atomic vectors

testPredictions.testvotes
Error: object 'testPredictions.testvotes' not found```

testPredictions[,2]
Error in testPredictions[, 2] : incorrect number of dimensions
aloysius-lim commented 10 years ago

In R, to access the slots in an object, you can do this: testPredictions@testvotes.

bbalin12 commented 10 years ago

thanks!

hmboxwala commented 9 years ago

Hi aloysius-lim, is this the right way to extract the probabilities from the predictions:

pred <- predict(bigrfModel,crossval[,-94],crossval[,94]) pred <- pred@testvotes/rowSums(pred@testvotes)

thanks.

aloysius-lim commented 9 years ago

Yes, that looks right.

m-monteiro commented 9 years ago

Is this the right way to reliably determine which columns in bigcprediction correspond to which classes?

pred <- predict(model, x=data)
attr(pred@testvotes,"dimnames")$Class

i.e., am I guaranteed that element 1 of the expression above is the correct label for the 1st column in pred@testvotes?

aloysius-lim commented 9 years ago

Yes, the order of columns is the same as that of the argument y that you supplied to bigrfc().