automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.54k stars 1.28k forks source link

Access to the prediction probas on training cohort #1147

Closed zgm0407 closed 3 years ago

zgm0407 commented 3 years ago

Hi, I want to get the prediction probas of each subject in my training cohort. I notice there are similar questions (#670 and #348). However, the example does not provide details of probas. In addition, I have not found API or corresponding examples for train_evaluator.py. Is there any API document about train_evaluator.py or example? Thanks

franchuterivera commented 3 years ago

Hello, thanks for using auto-sklearn.

Auto-sklearn fits an ensemble of individual configurations, the latter found using SMAC. The below code snipped shows how one can access the prediction probabilities of both the final ensemble and each of the individual models:

   import sklearn.metrics

   import autosklearn.classification

   ############################################################################
   # Data Loading
   # ============

   X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
   X_train, X_test, y_train, y_test = \
       sklearn.model_selection.train_test_split(X, y, random_state=1)

   ############################################################################
   # Build and fit a classifier
   # ==========================

   automl = autosklearn.classification.AutoSklearnClassifier(
       time_left_for_this_task=120,
       per_run_time_limit=30,
       tmp_folder='/tmp/autosklearn_classification_example_tmp',
       output_folder='/tmp/autosklearn_classification_example_out',
   )
   automl.fit(X_train, y_train, dataset_name='breast_cancer')

   # Gettting the probabilities from the ensemble
   probabilities = automl.predict_proba(X_test)

   # Getting the probabilities of each model individually
   for weight, model in automl.get_models_with_weights():
       probability = model.predict_proba(X_test)
 >>    print(f"For model={model.steps[-1]} weight={weight} produced prediction with shape={probability.shape}")

Notice that each of the individual models is compatible with scikit-learn pipelines. In this sense, they support all methods like predict_proba, predict, etc.

Please let me know if that does not answer your question.

For the train_evaluator.py we do not have documentation yet, but if there is any question that we can help with, please let us know.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs for the next 7 days. Thank you for your contributions.

github-actions[bot] commented 3 years ago

This issue has been automatically closed due to inactivity.