elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.19k stars 24.85k forks source link

Handle painless failures with data annotations, without shard failures #41393

Closed Mekk closed 4 years ago

Mekk commented 5 years ago

(I am not 100% sure how much of this is related to ES and how much to kibana)

My main painful point of using kibana scripted fields is that of error handling. I forgot to check for field presence, I tried accessing text field via fields aray, I called non-existing, function,… → I get dreaded "N shards failed" message and have to hunt elasticsearch logs for java stacks. Brr.

It would be great if I could opt to just get the results instead, and have errors stored inside them (for example as _painless_script_failed: "… error message …"). Then the problem description would be easily available, and the data which triggers the error could be simply seen in kibana

In fact I ended up doing that myself, by wrapping all my scripted fields with try/catch constructs doing exactly that – but such boilerplate code doesn't seem a good idea in light field possibly intended to be sometimes written or patched even by less technical users.

elasticmachine commented 5 years ago

Pinging @elastic/es-core-infra

gwbrown commented 5 years ago

Hi! What version are you encountering this behavior on? Do you have an example of a scripted field definition that fails that you could share so we can reproduce this more easily?

Mekk commented 5 years ago

Mostly 6.7 although some on 7.0 too.

My scripts aren't particularly smart (mostly attempts to concatenate a few simple fields with newlines to make more table-friendly item, or attempts to extract most important text part for the same purpose). Two most typical errors I faced

return doc['some.field'].value + doc['other.field'].value;     

(crash if some.field or other.field is not present, this crash can be also reproduced using params._source in a similar way)

String x = doc['some.field'].value;     

(crash on shards where some.field is text, not keyword, say because of mapping problem in the past)

I also saw some NullPointerExceptions, most easily just by trying to call some method (like .indexOf) on field which turned missing.

And there were 6.7-7.0 changes (IIRC I had to replace some doc.containsKey(…) with doc[…].size()!-0). turned less

Example of full script of a kind (not very interesting, drop try/catch to get errors sometimes, all those if's were built by repeatable attempts until I got it mostly working, the commented out xml.basic_info is my last actual crash triggered when xml.basic_info turned out mapped as text, not keyword, on older shards)

try {

    if(params._source.xml != null && params._source.xml.basic_info != null) {
        return params._source.xml.basic_info;
    }
    /* if(doc['xml.basic_info'].size() != 0) {
        return doc['xml.basic_info'].value;
    } */

    if(doc['xml.client'].size() != 0) {
        String result = doc['xml.client'].value;

        if(doc['xml.what'].size() != 0) {
            result = result + '/' + doc['xml.what'].value;
        }

        if(doc['xml.msg_type'].size() != 0) {
            result = result + '
<' + doc['xml.msg_type'].value + '>';
        }

        if(doc['xml.result'].size() != 0) {
            result = result + '
' + doc['xml.result'].value;
        }

        if(doc['xml.err_code'].size() != 0) {
                result = result + '
' + doc['xml.err_code'].value;
        }

        return result;
    }

}
catch(Exception e) {
    /* return "ERROR: " + e; */
}
Mekk commented 5 years ago

Ah, btw, yet another way to crash is to

try {
    /// trigger some problem
}
catch(Exception e) {
     return e;
}

(this is my actual case, of course I suppose direct return of some non-string object would do too)

rjernst commented 5 years ago

I don't think there is much that can be done here, but I will mark this for discussion by the rest of the team.

When a failure occurs in a script, it occurs in one to many shards, so there isn't necessarily one error that can be shown. Additionally, scripts can be used in many places within elasticsearch APIs, so there isn't a simple way to extract the fact a script failed, perhaps deep down within the request.

Finally, return the exception as you suggest isn't possible because each location scripts are used has a concrete return type. Exceptions are exceptional cases, and that is what throw is for.

Mekk commented 5 years ago

My suggestions are strictly about Kibana scripted fields (not about any painless usage cases). Those scripts are used to scriptlety-implement additional fields in query results.

If I understand correctly, they are actually implemented by appending some scripting clauses (filter/script or sth like that) to search queries. So I imagine solution like:

The latter idea (silently wrapping script text with additional try/catch) seems to be implementable in kibana solely, but I feel that some standarized ES backend would do better as it could be used also in other contexts.

talevy commented 5 years ago

The team discussed this and could not reason a convenient wrapping of such failures. It is common for some fields to be missing from documents, while present in others. It is best if there are NPE-guards on field accesses just in case.

It might help a ton to receive more information from Kibana about the shard failures: https://github.com/elastic/kibana/issues/5957.

instead of the simple message:

Screen Shot 2019-05-30 at 11 40 05 AM
thekofimensah commented 4 years ago

I have to agree.. I have a field that doesn't occur in ever document and where it doesn't exist for one day of logs, then I get a shard failure and the rest of the documents don't calculate the field.

So I have logs, and on Aug 31, logs come in that don't have one of the scripted field's fields. And so I get a 3 of 6 shards failed because of that missing data point. In a visualization, the data gets cut after Aug 31, and in the discover tab, I get on the top left the # of results returned, but I don't get any results actually appearing. Only when I look prior or after Aug 31 am I able to see the results.

My scripted field is:

if (doc['field1'].size() != 0 && doc['field2'].value > 0) {
    return doc['field1'].value / doc['field2'].value
}else {
    return null
}

I can remove this error with @Mekk suggestion:

try{
if (doc['meterEnergyConsumptionKwh'].size() != 0 && doc['meterEnergyConsumptionKwh'].value > 0) {
    return doc['meterEnergyConsumptionCost'].value / doc['meterEnergyConsumptionKwh'].value
}else {
    return null
}
}
catch(Exception e) {
    /* return "ERROR: " + e; */
}
stu-elastic commented 4 years ago

@Mekk Kibana has enhanced it's error reporting for these kinds of errors. If I create an index pattern for several indicies and a scripted field uses fields missing from certain indicies in the index pattern, in discover I see the following on 7.5.2: fail1 fail2 fail3