elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.68k stars 24.66k forks source link

[ML] Support nested mapped fields in Data Frame Analytics jobs #69231

Open alvarezmelissa87 opened 3 years ago

alvarezmelissa87 commented 3 years ago

Currently, analytics jobs do not support fields mapped as nested fields (https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html) in the analysis.

This restriction was in place because you cannot have both nested fields and a sorted index - but now the destination index is no longer sorted. So this restriction can probably be removed.

cc @benwtrent

elasticmachine commented 3 years ago

Pinging @elastic/ml-core (Team:ML)

dimitris-athanasiou commented 3 years ago

I looked into this. It's non-trivial to support nested fields. However, it does not really make sense to map a field as nested unless it is used for arrays of objects. Those are not supported anyway as we don't have a way of expanding the array into multiple rows and then merge the result back in a sensible way.

Nevertheless, looking into this revealed a bug with nested fields. The _explain API will report they are supported and includes them into the analysis (well, not the nested field it self but its children) while the values for those fields are not actually included in the analysis. I'll work on a fix for this.

dimitris-athanasiou commented 3 years ago

I have raised https://github.com/elastic/elasticsearch/pull/71400 to properly exclude nested fields