elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.07k stars 24.83k forks source link

ESQL: If query lists many many many many fields then the internal field caps action will fail #110845

Closed nik9000 closed 3 months ago

nik9000 commented 4 months ago

Description

This is what the error message looks like:

Caused by: org.elasticsearch.common.io.stream.NotSerializableExceptionWrapper: too_complex_to_determinize_exception: Determinizing automaton with 2254 states and 2347 transitions would require more than 10000 effort.
    at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:735)
    at org.apache.lucene.util.automaton.RunAutomaton.<init>(RunAutomaton.java:72)
    at org.apache.lucene.util.automaton.CharacterRunAutomaton.<init>(CharacterRunAutomaton.java:36)
    at org.apache.lucene.util.automaton.CharacterRunAutomaton.<init>(CharacterRunAutomaton.java:23)
    at org.elasticsearch.common.regex.Regex.simpleMatcher(Regex.java:128)
    at org.elasticsearch.action.fieldcaps.TransportFieldCapabilitiesAction$NodeTransportHandler.lambda$messageReceived$0(TransportFieldCapabilitiesAction.java:541)
    at org.elasticsearch.action.ActionListener.completeWith(ActionListener.java:348)
elasticsearchmachine commented 4 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

nik9000 commented 4 months ago

I need to confirm this. I'm getting the error message from a log but it really look like someone did something like FROM * | KEEP a, b, c, d, ....., a1, b2,.....d1324.

nik9000 commented 4 months ago

This reproduces it:

curl -uelastic:password -XDELETE localhost:9200/test
curl -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_doc -d'{"foo": "bar"}'

echo '{"query": "FROM test' > /tmp/query
for i in $(seq 1 300); do
    echo -n '| KEEP f0'$i >> /tmp/query
    for j in $(seq 1 300); do
        echo -n ', f'$j$i >> /tmp/query
    done
    echo >> /tmp/query 
done
echo '"}' >> /tmp/query
curl -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/_query -d@/tmp/query
Determinizing automaton with 691104 states and 773832 transitions would require more than 10000 effort.
nik9000 commented 3 months ago

The numbers I used above are quite high to reproduce. This'll do it with much fewer fields:

curl -uelastic:password -XDELETE localhost:9200/test
curl -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/test/_doc -d'{"foo": "bar"}'

echo '{"query": "FROM test' > /tmp/query
for i in $(seq 1 42); do
    echo -n '| KEEP f0'$i >> /tmp/query
    for j in $(seq 1 300); do
        echo -n ', f'$j$i >> /tmp/query
    done
    echo >> /tmp/query 
done
echo '"}' >> /tmp/query
curl -uelastic:password -HContent-Type:application/json -XPOST localhost:9200/_query -d@/tmp/query
nik9000 commented 3 months ago

We can make those examples work.

I think it's possible that there are other ways to cause this, so there's more to push here. I think.

nik9000 commented 3 months ago

I think we should do a few things: