Open dschneiter opened 2 years ago
I found a workaround to achieve what I want with only a limited amount of painless, but it's still cumbersome and an optional configuration parameter would be much nicer to get to the same result.
My workaround solution:
using a for_each
processor in the main pipeline:
{
"foreach": {
"field": "email",
"processor": {
"pipeline": {
"name": "single_enrichment"
}
}
}
}
and in the single_enrichment
pipeline doing the following:
PUT _ingest/pipeline/single_enrichment
{
"processors": [
{
"set": {
"field": "tmp.email",
"value": "{{{_ingest._value}}}"
}
},
{
"set": {
"field": "tmp.name",
"value": "Unknown Employee"
}
},
{
"enrich": {
"field": "_ingest._value",
"target_field": "tmp",
"policy_name": "names_policy",
"max_matches": 1,
"ignore_missing": false,
"override": true
}
},
{
"script": {
"lang": "painless",
"source": "if (ctx.enrichment == null) ctx.enrichment = []; ctx.enrichment.add(ctx.tmp)"
}
},
{
"remove": {
"field": "tmp"
}
}
]
}
Comment on workaround approach
The tmp
object is a temporary object only "living" during this single lookup and it gets cleaned up/removed after every execution of this pipeline. It represents the "same" object as after a successful look-up but initializes the two fields that make up the object with the default values I'd like to see in the object (the email address used for the lookup and the default value for "name" if the lookup was not successful). (Such a value or object could ideally be specified as a non_match_value
in the enrich processor).
The workaround then does the actual enrichment step which - if successful - overwrites the tmp
object with the values returned from the enrichment step.
Then the whole tmp
object gets added to the target field (painless was needed for this step as the append processor would have added a comma separated string representation of the tmp
object, rather than the actual JSON object to the target-field enrichment
).
Pinging @elastic/es-data-management (Team:Data Management)
Description
Enriching documents based on a multi-value match-field (with
max_matches
parameter set to a value > 0), it is possible that for some values there is a matching entry in the look-up index, whereas for some other values there isn't.It would be convenient to be able to configure an optional
no_match_value
and enrich the documents with this value in case there was no entry found in the enrich index.Use-case:
knownemployee@elastic.co
in our enrich index, but not fordoesnotexist@elastic.co
Current behavior:
Ideal behaviour:
It would be nice to have the possibility to specify a default value that should be used for enrichment purposes in case the lookup in the enrich index is not successful. Without such an option one would need to complement the enrich processor with a script processor checking for the existence of every single match-value in the enrichment field and in case it's not there adding that value with a default value to the enrichment field. Quite a lot of effort and the complexity to deal with painless for not such a strange and uncommon scenario/use-case