Closed lkts closed 1 month ago
Pinging @elastic/es-storage-engine (Team:StorageEngine)
It looks like flattened
field type in general does not support correctly displaying arrays in synthetic source. Synthetic source is generated using doc_values of keyed field (._keyed
) which encode a path to the field inside an object and its value in one byte array. The problem is that during parsing all object fields are simply added to the keyed field even if multiple objects are parsed in one document (e.g. with arrays or object arrays one level up in the document). During construction of synthetic source doc_values are retrieved and written, resulting in one giant object containing combined fields from all flattened values indexed.
Some options:
flattened
values as we do for other field types. That seems quite strict but it is not clear what is the real usage of that.synthetic_source_keep
to arrays
for flattened fields. That does not however solve the problem of arrays being present on a higher level in the document.flattened
. Not ideal for disk space given we have most of the data in doc_values already. If it's rare enough could be okay.Example (with current state of code):
Expected:
{
"field": [
{
"KOGtOnvgpw": "PHU",
"gcvFjmPHFd": "KwbkSyLlC"
},
{
"BYliNOBHKM": {
"XVaROQmSKP": "dYfCP",
"ZaApOr": [
"1074156",
"1129404",
"1204799",
"1348011",
"1723590",
"183559",
"448895"
]
},
"CnxMeelQhJ": "P",
"kPUVedTaPY": "CWOLm"
}
]
}
but: was
{
"field": {
"BYliNOBHKM": {
"XVaROQmSKP": "dYfCP",
"ZaApOr": [
"1074156",
"1129404",
"1204799",
"1348011",
"1723590",
"183559",
"448895"
]
},
"CnxMeelQhJ": "P",
"KOGtOnvgpw": "PHU",
"gcvFjmPHFd": "KwbkSyLlC",
"kPUVedTaPY": "CWOLm"
}
}
Note how KOGtOnvgpw
is a field of the separate object but gets merged into one object in synthetic source.
Or a repro:
PUT my-index
{
"mappings": {
"_source": { "mode": "synthetic" },
"properties": {
"f": {
"type": "flattened"
}
}
}
}
GET my-index
POST my-index/_bulk?refresh
{ "create": {} }
{ "f": [ { "a": "a" }, { "b": "b" } ] }
POST my-index/_search
--------------------------
"f": {
"a": "a",
"b": "b"
}
I think this is the way synthetic source handles arrays of objects...which is "arrays are moved to leaves". See https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-source-field.html#synthetic-source-modifications-leaf-arrays
Elasticsearch Version
8.15
Installed Plugins
No response
Java Version
bundled
OS Version
x
Problem Description
Synthetic source is wrong for arrays of flattened fields containing fields longer than ignore_above.
Steps to Reproduce
Produces:
Logs (if relevant)
No response