ClimbsRocks / machineJS

[UNMAINTAINED] Automated machine learning- just give it a data file! Check out the production-ready version of this project at ClimbsRocks/auto_ml
https://github.com/ClimbsRocks/auto_ml
408 stars 64 forks source link

Classifier: output proba instead of category #155

Open kprimice opened 8 years ago

kprimice commented 8 years ago

Is it possible to get the proba of each category, and not the most probable one, when using classifiers?

ClimbsRocks commented 8 years ago

For the case of a single category (defaulted or not, bought or not), yes!

For the case of multiple categories (shopper type A,B,C, or D), you should be able to, but it will require a little bit of work.

To do this, go to pySetup/makePredictions.py. Look for where we actually get the predictions (where we invoke classifier.predict_proba: currently line 106 - 116). What you want to do now is get the predicted probabilities during the validationRound. If you get them during the non-validation round, it will break ensembler. As it is, this hack will break the final assembly of ensembler, but it will still train up a bunch of ensembled models, which you can then read through.

So in the else case (problemType != 'category'), simply change classifier.predict to classifier.predict_proba.

Now our ensemble algorithms will make probability predictions.

Again, this means that when ensembler goes to assemble together the predictions from all our ensemble algorithms, it will fail. But that's ok. Simply find which of those algorithms was most accurate, and use the results found in that file.

Again, this whole process is a bit of a hack for now. At some point I'll try to build this in as standard functionality in ensembler, but it will be increasing complexity substantially, and there are easier wins to focus on right now. If you're interested in taking a crack at building it in, I'd love that PR!

Thanks for using this, and filing issues. Let me know if you have any other questions!