Shark-ML / Shark

The Shark Machine Leaning Library. See more:
http://shark-ml.github.io/Shark/
GNU Lesser General Public License v3.0
504 stars 131 forks source link

How to get mean value from 0/1 RFClassifier #282

Closed Mad5ci closed 3 years ago

Mad5ci commented 3 years ago

We've built a random forest classifier that trains on and delivers a 0 or 1 output for one of two classes. We want to be able to get a double value representing the mean value of all of the individual trees. So, for example, if 2/3 of the trees in the forest vote 1 then we would expect to get a value near 0.666. There doesn't appear to be a way to drill down to that level of detail -- but maybe by doing some trickery with the decision function.

How should we go about getting at the data from the trees after a prediction?

Ulfgard commented 3 years ago

If you use the current version on master, i.e. the last release, you should be able to do:

auto class_probabilities = random_forest.decisionFunction() (inputs)

this will be a a vector/matrix with N elements per row (where N is number of classes). The output should be normalized between 0 and 1. I think for binary classification, the second value should be the one you are after (proportion of classes with label 1)

I hope this helps.


From: Pete McNeil notifications@github.com Sent: Monday, October 12, 2020 11:10:43 PM To: Shark-ML/Shark Cc: Subscribed Subject: [Shark-ML/Shark] How to get mean value from 0/1 RFClassifier (#282)

We've built a random forest classifier that trains on and delivers a 0 or 1 output for one of two classes. We want to be able to get a double value representing the mean value of all of the individual trees. So, for example, if 2/3 of the trees in the forest vote 1 then we would expect to get a value near 0.666. There doesn't appear to be a way to drill down to that level of detail -- but maybe by doing some trickery with the decision function.

How should we go about getting at the data from the trees after a prediction?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FShark-ML%2FShark%2Fissues%2F282&data=02%7C01%7Coswin.krause%40di.ku.dk%7C19d21a33cb31470999ba08d86ef348de%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637381338474936691&sdata=LZeZuT3uHkvnK4EoI7rSyuoTYr3gTheJmyoolvaeyno%3D&reserved=0, or unsubscribehttps://eur02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FADSZGBTFCNJXNKYRLYOYLGTSKNWFHANCNFSM4SNL65CQ&data=02%7C01%7Coswin.krause%40di.ku.dk%7C19d21a33cb31470999ba08d86ef348de%7Ca3927f91cda14696af898c9f1ceffa91%7C0%7C0%7C637381338474946686&sdata=sD3ZMRzyv8pKGpve286Yc08vvQWsCe7T%2FapOBCY9z%2BM%3D&reserved=0.

Mad5ci commented 3 years ago

Thanks! That worked... Here is a snippet of the code that's giving the desired result.

    // Code to make the prediction
    double thePrediction;
    unsigned int modelOutput;

    auto predictionData = model.decisionFunction()(theInputs);

    thePrediction = predictionData.element(0)[1];