HunterMcGushion / hyperparameter_hunter

Easy hyperparameter optimization and automatic result saving across machine learning algorithms and libraries
MIT License
704 stars 100 forks source link

How to do predict_proba in catboost classifier? #180

Open caowencai opened 5 years ago

caowencai commented 5 years ago

Confused that the example does not show how to do predict_proba clearly. And how to get the predict results?

HunterMcGushion commented 5 years ago

Thanks for raising this! predict_proba definitely needs to be better documented and described in more examples.

You can control whether to invoke predict_proba by Environment's do_predict_proba kwarg. do_predict_proba can be a boolean (default=False) or an int. The int form is used to specify the column index of the class probabilities you want to use. So if you're doing binary classification and want the probabilities for the "1" class, you would use do_predict_proba=1.

At the moment, this is the only example that uses do_predict_proba (line 215).

Sorry this was so hard to find. Please let me know if this is what you're looking for, and if you have any suggestions for making this easier!

caowencai commented 5 years ago

Not clear yet. In the src, hard to find where the prediction results return. Examples are all about training only. An example about do test_dataset(without labels) predictions would be much appreciated.

HunterMcGushion commented 5 years ago

Ah I could definitely document that better. Thanks for bringing that up. Would it help to add an "Attributes" section to the docstring of experiments.BaseExperiment in which I list the different dataset attributes?

In your case, you probably want your Experiment's data_test.predictions attribute. This contains several other attributes for different views of the data at different time divisions. The internals of data_test and the other BaseExperiment.data... attributes are documented in the module docstring of data.data_core. So I'd recommend reading that to better understand what you can access.

However, the easiest way to get the full test predictions at the end of an Experiment would be through the "HyperparameterHunterAssets" directory built by your Environment.results_path. All of your Experiments' test predictions can be found in csv files inside "HyperparameterHunterAssets/Experiments/PredictionsTest" unless you're black-listing them or something.

Are your Experiments' results not being automatically saved? If you do think that something is misbehaving, would you mind sharing some minimal code so I can reproduce the problem?