elastic / kibana

Your window into the Elastic Stack
https://www.elastic.co/products/kibana
Other
19.69k stars 8.23k forks source link

[ML] User-friendly experience for categorization of text fields #17997

Open elasticmachine opened 7 years ago

elasticmachine commented 7 years ago

Original comment by @droberts195:

This came out of a Slack chat with @peteharverson. It was also something that was brought up on the IRC channel during the recent ML webinar.

We would expect categorization to be applied to log messages, and we would expect people to be storing log messages in text fields, because that's what you have to do to make use of Elasticsearch's text search.

Additionally, the reverse search terms we generate as an output of categorization can only be used to efficiently search text fields.

However, at present we make it very hard for people to use a text field as their categorization_field_name when feeding a job with a datafeed. They have to set the obscure "_source": true setting in the JSON.

I propose the following:

  1. If a field of type text is selected as the categorization_field_name we automatically set "_source": true in the datafeed config
  2. If a field that is not of type text is selected as the categorization_field_name we warn people that it's unlikely to work well with categorization
elasticmachine commented 7 years ago

Original comment by @skearns64:

++, this will help the vast majority of users.

elasticmachine commented 7 years ago

Original comment by @Harvey-Maddocks:

I have created a snapshot dataset called it_ops_new_raw_snapshot of an index called it_ops_new_raw, that will help with testing this behaviour. This contains as a type called logs (which is just the old it_ops_app_logs dataset). Which has as it's mapping for the message field both a type text and type keyword.