elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.93k stars 24.74k forks source link

[ML] Add new `top_class_threshold`parameter to classification analytics #48137

Open benwtrent opened 5 years ago

benwtrent commented 5 years ago

It is currently only possible to provide a static number for the number of top classes to return (num_top_classes).

It would be beneficial to have a top_class_threshold or something similar so that all the top classes that are above a given threshold are returned.

This feature would give some good indication (at a glance) around the distribution of probabilities for a given prediction. Additionally, if top_classes is empty on return, it could indicate a very low probability/confidence of the given predicted value.

elasticmachine commented 5 years ago

Pinging @elastic/ml-core (:ml)

sophiec20 commented 5 years ago

A threshold would also give a similar user experience to outliers i.e. feature_influence_threshold and potentially more useful results, although without a hard limit.

num_top_classes allows a hard limit, but these could be pointlessly low probabilities.

Also to note, having two ways to set this does feel like overkill.