elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.98k stars 24.75k forks source link

SQL: inconsistent handling of range queries with strings for datetime arguments #87268

Open bpintea opened 2 years ago

bpintea commented 2 years ago

The following query filtering with a conjunction of two comparisons works as expected: SELECT hire_date FROM employees WHERE 'now-24y' > hire_date and hire_date < '1999-03-01' ORDER BY hire_date DESC LIMIT 3 returns:

       hire_date        
------------------------
1997-05-19T00:00:00.000Z
1996-11-05T00:00:00.000Z
1995-12-15T00:00:00.000Z

Similarly (following #87151) this query having a field provided in a range query works too: SELECT hire_date FROM test_emp WHERE hire_date BETWEEN 'now-100y' AND '1999-03-01' ORDER BY hire_date DESC LIMIT 3 returns (same as above):

       hire_date        
------------------------
1997-05-19T00:00:00.000Z
1996-11-05T00:00:00.000Z
1995-12-15T00:00:00.000Z

However, having the field be part of the range fails: SELECT hire_date FROM test_emp WHERE 'now-24y' BETWEEN hire_date AND '1999-03-01' ORDER BY hire_date DESC LIMIT 3 returns nothing. Note: now-24y (currently) returns a timestamp rounded down to 1998-05-31.

Similarly, the following query using no date arithmetic will fail too: SELECT hire_date FROM test_emp WHERE '1998-05-31' BETWEEN hire_date AND '1999-03-01' ORDER BY hire_date DESC LIMIT 3 returns nothing.

But casting the strings will make the query work: SELECT hire_date FROM test_emp WHERE '1998-05-31'::TIMESTAMP BETWEEN hire_date AND '1999-03-01'::TIMESTAMP ORDER BY hire_date DESC LIMIT 3 returns (same as above):

       hire_date        
------------------------
1997-05-19T00:00:00.000Z
1996-11-05T00:00:00.000Z
1995-12-15T00:00:00.000Z
elasticmachine commented 2 years ago

Pinging @elastic/es-ql (Team:QL)

bpintea commented 2 years ago

Similar, but not identical to #77179, potentially linked to #77055.

luigidellaquila commented 2 years ago

This problem seems to be related to how the query is translated more than to #77179

The problematic query is handled as a Painless script

 "query": {
        "script": {
            "script": {
                "source": "InternalQlScriptUtils.nullSafeFilter(InternalQlScriptUtils.and(InternalQlScriptUtils.gte(params.v0, InternalQlScriptUtils.docValue(doc,params.v1)), InternalQlScriptUtils.lte(params.v2, params.v3)))",
                "lang": "painless",

while the valid one is translated as a normal range query

"query": {
        "range": {
            "hire_date": {
                "gte": "...",
                "lte": "...",
                "time_zone": "Z",
                "boost": 1.0
            }
        }

The range query handles correctly both automatic conversions and date maths; we should fix the painless logic to do the same.

In some cases, we could also rewrite the condition as two inequalities, having the field value as the first argument, eg.

SELECT hire_date FROM test_emp WHERE '1998-05-31' BETWEEN hire_date AND '1999-03-01' ORDER BY hire_date DESC LIMIT 3

would become

SELECT hire_date FROM test_emp WHERE  hire_date < '1998-05-31' AND '1998-05-31' < '1999-03-01' ORDER BY hire_date DESC LIMIT 3

In both cases, date maths logic has to be re-implemented in SQL.

bpintea commented 2 years ago

In the failing range SQL query (3rd above), implicit casting of the literals to field's type would already be an improvement. Rewriting the range to a conjunction of inequalities would allow the optimiser to then even fold the literals comparison indeed (if not containing date math) and execution would be that of a range query (which would also allow pushing down the date math).

elasticsearchmachine commented 9 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)