Closed Mekk closed 4 years ago
Pinging @elastic/es-core-infra
Hi! What version are you encountering this behavior on? Do you have an example of a scripted field definition that fails that you could share so we can reproduce this more easily?
Mostly 6.7 although some on 7.0 too.
My scripts aren't particularly smart (mostly attempts to concatenate a few simple fields with newlines to make more table-friendly item, or attempts to extract most important text part for the same purpose). Two most typical errors I faced
return doc['some.field'].value + doc['other.field'].value;
(crash if some.field or other.field is not present, this crash can be also reproduced using params._source in a similar way)
String x = doc['some.field'].value;
(crash on shards where some.field is text, not keyword, say because of mapping problem in the past)
I also saw some NullPointerExceptions, most easily just by trying to call some method (like .indexOf
) on field which turned missing.
And there were 6.7-7.0 changes (IIRC I had to replace some doc.containsKey(…) with doc[…].size()!-0). turned less
Example of full script of a kind (not very interesting, drop try/catch to get errors sometimes, all those if's were built by repeatable attempts until I got it mostly working, the commented out xml.basic_info is my last actual crash triggered when xml.basic_info turned out mapped as text, not keyword, on older shards)
try {
if(params._source.xml != null && params._source.xml.basic_info != null) {
return params._source.xml.basic_info;
}
/* if(doc['xml.basic_info'].size() != 0) {
return doc['xml.basic_info'].value;
} */
if(doc['xml.client'].size() != 0) {
String result = doc['xml.client'].value;
if(doc['xml.what'].size() != 0) {
result = result + '/' + doc['xml.what'].value;
}
if(doc['xml.msg_type'].size() != 0) {
result = result + '
<' + doc['xml.msg_type'].value + '>';
}
if(doc['xml.result'].size() != 0) {
result = result + '
' + doc['xml.result'].value;
}
if(doc['xml.err_code'].size() != 0) {
result = result + '
' + doc['xml.err_code'].value;
}
return result;
}
}
catch(Exception e) {
/* return "ERROR: " + e; */
}
Ah, btw, yet another way to crash is to
try {
/// trigger some problem
}
catch(Exception e) {
return e;
}
(this is my actual case, of course I suppose direct return of some non-string object would do too)
I don't think there is much that can be done here, but I will mark this for discussion by the rest of the team.
When a failure occurs in a script, it occurs in one to many shards, so there isn't necessarily one error that can be shown. Additionally, scripts can be used in many places within elasticsearch APIs, so there isn't a simple way to extract the fact a script failed, perhaps deep down within the request.
Finally, return the exception as you suggest isn't possible because each location scripts are used has a concrete return type. Exceptions are exceptional cases, and that is what throw
is for.
My suggestions are strictly about Kibana scripted fields (not about any painless usage cases). Those scripts are used to scriptlety-implement additional fields in query results.
If I understand correctly, they are actually implemented by appending some scripting clauses (filter/script or sth like that) to search queries. So I imagine solution like:
while defining scripted field in Kibana I can fill field like
In case of errors :
( ) fail shard processing
(*) mark record with tag [ _scripted_field_abcdef_failed ]
[x] and store error diagnostics in field [ _painless_error_message ]
depending on those settings kibana changes the query it executes (maybe it just wraps my script body with additional try/catch, or maybe it hints ElasticSearch to handle errors specifically…)
The latter idea (silently wrapping script text with additional try/catch) seems to be implementable in kibana solely, but I feel that some standarized ES backend would do better as it could be used also in other contexts.
The team discussed this and could not reason a convenient wrapping of such failures. It is common for some fields to be missing from documents, while present in others. It is best if there are NPE-guards on field accesses just in case.
It might help a ton to receive more information from Kibana about the shard failures: https://github.com/elastic/kibana/issues/5957.
instead of the simple message:
I have to agree.. I have a field that doesn't occur in ever document and where it doesn't exist for one day of logs, then I get a shard failure and the rest of the documents don't calculate the field.
So I have logs, and on Aug 31, logs come in that don't have one of the scripted field's fields. And so I get a 3 of 6 shards failed because of that missing data point. In a visualization, the data gets cut after Aug 31, and in the discover tab, I get on the top left the # of results returned, but I don't get any results actually appearing. Only when I look prior or after Aug 31 am I able to see the results.
My scripted field is:
if (doc['field1'].size() != 0 && doc['field2'].value > 0) {
return doc['field1'].value / doc['field2'].value
}else {
return null
}
I can remove this error with @Mekk suggestion:
try{
if (doc['meterEnergyConsumptionKwh'].size() != 0 && doc['meterEnergyConsumptionKwh'].value > 0) {
return doc['meterEnergyConsumptionCost'].value / doc['meterEnergyConsumptionKwh'].value
}else {
return null
}
}
catch(Exception e) {
/* return "ERROR: " + e; */
}
@Mekk
Kibana has enhanced it's error reporting for these kinds of errors. If I create an index pattern for several indicies and a scripted field uses fields missing from certain indicies in the index pattern, in discover I see the following on 7.5.2
:
(I am not 100% sure how much of this is related to ES and how much to kibana)
My main painful point of using kibana scripted fields is that of error handling. I forgot to check for field presence, I tried accessing text field via fields aray, I called non-existing, function,… → I get dreaded "N shards failed" message and have to hunt elasticsearch logs for java stacks. Brr.
It would be great if I could opt to just get the results instead, and have errors stored inside them (for example
as _painless_script_failed: "… error message …"
). Then the problem description would be easily available, and the data which triggers the error could be simply seen in kibanaIn fact I ended up doing that myself, by wrapping all my scripted fields with try/catch constructs doing exactly that – but such boilerplate code doesn't seem a good idea in light field possibly intended to be sometimes written or patched even by less technical users.