dataiku / dataiku-api-client-python

Python client for the DSS public API
https://doc.dataiku.com/dss/latest/api/public/
Other
41 stars 25 forks source link

Ability to override the default threshold at model import / to use optimal at eval #267

Closed paulhenricalas-dku closed 2 years ago

paulhenricalas-dku commented 2 years ago

To be completed by the according PR on DIP

import_mlflow_version_from_managed_folder and import_mlflow_version_from_path take an optional binary_classification_threshold parameter. In bidimensional classification models, it allows the user to override its 0.5 value.

evaluatetakes an optional use_optimal_threshold parameter, set by default to False. It allows the user to evaluate according to the previously computed optimal threshold at training, depending on the metric set in the saved model.

for example:

settings = saved_model.get_settings()
settings.prediction_metrics_settings['thresholdOptimizationMetric'] = 'ACCURACY'
settings.save()

# ...
model_version = saved_model.import_mlflow_version_from_path("v0", model_dir, binary_classification_threshold=0.2)
# ...

# will evaluate using the optimal threshold according to accuracy
model_version.evaluate("DATASET", use_optimal_threshold=True) 

# will evaluate using a threshold set to 0.2
model_version.evaluate("DATASET") 
shortcut-integration[bot] commented 2 years ago

This pull request has been linked to Shortcut Story #95506: Experiment Tracking: Allow to override the default threshold when importing a MLflow binary classification model.