elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.49k stars 24.6k forks source link

[ES|QL] NOT IN may return wrong results if there is null in the inlist #112065

Closed fang-xing-esql closed 2 weeks ago

fang-xing-esql commented 3 weeks ago

When there is null in the inlist, NOT IN may return empty/wrong resultset, instead of qualified results.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "FROM books | sort author "
}
'
     author      | author.keyword  |       name       |   name.keyword   |  page_count   |      release_date      
-----------------+-----------------+------------------+------------------+---------------+------------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space  |Revelation Space  |585            |2000-03-15T00:00:00.000Z
Aldous Huxley    |Aldous Huxley    |Brave New World   |Brave New World   |268            |1932-06-01T00:00:00.000Z
George Orwell    |George Orwell    |1984              |1984              |328            |1985-06-01T00:00:00.000Z
Margaret Atwood  |Margaret Atwood  |The Handmaids Tale|The Handmaids Tale|311            |1985-06-01T00:00:00.000Z
Neal Stephenson  |Neal Stephenson  |Snow Crash        |Snow Crash        |470            |1992-06-01T00:00:00.000Z
Ray Bradbury     |Ray Bradbury     |Fahrenheit 451    |Fahrenheit 451    |227            |1953-10-15T00:00:00.000Z

IN returns correct results without null.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "FROM books | where author in (\"Neal Stephenson\", \"Ray Bradbury\") | sort author "
}
'
    author     |author.keyword |     name      | name.keyword  |  page_count   |      release_date      
---------------+---------------+---------------+---------------+---------------+------------------------
Neal Stephenson|Neal Stephenson|Snow Crash     |Snow Crash     |470            |1992-06-01T00:00:00.000Z
Ray Bradbury   |Ray Bradbury   |Fahrenheit 451 |Fahrenheit 451 |227            |1953-10-15T00:00:00.000Z

IN returns correct results with null.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "FROM books | where author in (\"Neal Stephenson\", \"Ray Bradbury\", null) | sort author "
}
'

    author     |author.keyword |     name      | name.keyword  |  page_count   |      release_date      
---------------+---------------+---------------+---------------+---------------+------------------------
Neal Stephenson|Neal Stephenson|Snow Crash     |Snow Crash     |470            |1992-06-01T00:00:00.000Z
Ray Bradbury   |Ray Bradbury   |Fahrenheit 451 |Fahrenheit 451 |227            |1953-10-15T00:00:00.000Z

NOT IN returns correct results without null.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "FROM books | where author not in (\"Neal Stephenson\", \"Ray Bradbury\") | sort author "
}
'
     author      | author.keyword  |       name       |   name.keyword   |  page_count   |      release_date      
-----------------+-----------------+------------------+------------------+---------------+------------------------
Alastair Reynolds|Alastair Reynolds|Revelation Space  |Revelation Space  |585            |2000-03-15T00:00:00.000Z
Aldous Huxley    |Aldous Huxley    |Brave New World   |Brave New World   |268            |1932-06-01T00:00:00.000Z
George Orwell    |George Orwell    |1984              |1984              |328            |1985-06-01T00:00:00.000Z
Margaret Atwood  |Margaret Atwood  |The Handmaids Tale|The Handmaids Tale|311            |1985-06-01T00:00:00.000Z

NOT IN returns empty/wrong result with null.

+ curl -u elastic:password -v -X POST 'localhost:9200/_query?format=txt&pretty' -H 'Content-Type: application/json' '-d
{
  "query": "FROM books | where author not in (\"Neal Stephenson\", \"Ray Bradbury\", null) | sort author "
}
'
    author     |author.keyword |     name      | name.keyword  |  page_count   | release_date  
---------------+---------------+---------------+---------------+---------------+---------------
elasticsearchmachine commented 3 weeks ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

alex-spies commented 3 weeks ago

Nice find!

I wonder if this is really a bug though.

where author not in (\"Neal Stephenson\", \"Ray Bradbury\", null)

should be equivalent to

where not (author == \"Neal Stephenson\") and not (author == \"Ray Bradbury\") and not (author == null)

The problem is: author == null is not the same as author is null. author == null is always null, because, I think, null is conceptually missing info; so we cannot determine if the left hand side is equal to the right hand side.

For reference, where author ==null and where author != null always yield empty results, even if for some documents the author is null (or not).

Maybe we should just add a note to the docs, as this is unexpected (but consistent)?

fang-xing-esql commented 3 weeks ago

Nice find!

I wonder if this is really a bug though.

You are right! It is not difficult to be tricked by nulls. Let me do some more experiments and most likely this is as expected.