elixir-ecto / ecto_sql

SQL-based adapters for Ecto and database migrations
https://hexdocs.pm/ecto_sql
Apache License 2.0
579 stars 312 forks source link

Optimize `IN` expression in PostgreSQL #582

Closed KiKoS0 closed 11 months ago

KiKoS0 commented 11 months ago

Elixir version

Elixir 1.15.4 (compiled with Erlang/OTP 24)

Database and Version

PostgreSQL 14.6

Ecto Versions

ecto_sql 3.10.2

Database Adapter and Versions (postgrex, myxql, etc)

postgrex 0.17.4

Current behavior

👋

It's not technically a bug but in PostgreSQL, pinning an empty array in a where in clause yields different SQL from directly passing an empty array.

    q1 = from(u in "users", where: u.id in [], select: [:id])
    q2 = from(u in "users", where: u.id in ^[], select: [:id])

    iex(13)> Ecto.Adapters.SQL.to_sql(:all, Repo, q1)
    {"SELECT u0.\"id\" FROM \"users\" AS u0 WHERE (false)", []}

    iex(14)> Ecto.Adapters.SQL.to_sql(:all, Repo, q2)
    {"SELECT u0.\"id\" FROM \"users\" AS u0 WHERE (u0.\"id\" = ANY($1))", [[]]}

https://github.com/elixir-ecto/ecto_sql/blob/da2b3b5f42186d70395464f19fabf50e7d9aa1b5/lib/ecto/adapters/myxql/connection.ex#L661-L663

It's handled correctly in MySQL though so can I add the clause in the Postgrex adapter as well? Since PostgreSQL query optimizer doesn't noop with ANY('{}'):

> explain analyze select id from users where id = any('{}')

QUERY PLAN                                                                                                            |
----------------------------------------------------------------------------------------------------------------------+
Index Only Scan using users_pkey on users  (cost=0.14..7.97 rows=1 width=16) (actual time=0.087..0.087 rows=0 loops=1)|
  Index Cond: (id = ANY ('{}'::uuid[]))                                                                               |
  Heap Fetches: 0                                                                                                     |
Planning Time: 0.121 ms                                                                                               |
Execution Time: 0.288 ms                                                                                              |

> explain analyze select id from users where false

QUERY PLAN                                                                        |
----------------------------------------------------------------------------------+
Result  (cost=0.00..0.00 rows=0 width=0) (actual time=0.004..0.005 rows=0 loops=1)|
  One-Time Filter: false                                                          |
Planning Time: 0.075 ms                                                           |
Execution Time: 0.026 ms                                                          |

Expected behavior

Generate the same SQL for the two expressions:

    q1 = from(u in "users", where: u.id in [], select: [:id])
    q2 = from(u in "users", where: u.id in ^[], select: [:id])

    iex(13)> Ecto.Adapters.SQL.to_sql(:all, Repo, q1)
    {"SELECT u0.\"id\" FROM \"users\" AS u0 WHERE (false)", []}

    iex(14)> Ecto.Adapters.SQL.to_sql(:all, Repo, q2)
    {"SELECT u0.\"id\" FROM \"users\" AS u0 WHERE (false)", []}
josevalim commented 11 months ago

In Postgres, thanks to array support, all x in ^param compiles to the same query, which allows us to reuse it as prepared statements. Such optimization is not possible in MySQL, so every time you call it with a different amount of parameter, it is a different query, which we cannot cache. That's the reason why they are different.

If you care about this, you could check param == [] yourself, but the PG behaviour is consistent within itself. :)