duffelhq / paginator

Cursor-based pagination for Elixir Ecto
MIT License
750 stars 90 forks source link

The query is slow if `cursor_fields` contains the primary key #198

Open cytim opened 6 months ago

cytim commented 6 months ago

Problem

Hi there :) I found a problem that the pagination query is slow if cursor_fields contains the primary key. Or more specifically, it is slow if cursor_fields hits an index which null values are excluded.

Investigation

Take the below query as an example.

from(
  u in User,
  order_by: [asc: :id]
)
|> Repo.paginate(
  cursor_fields: [id: :asc],
  after: "g3QAAAABdwJpZGIAAYag",
  limit: 1
)

It will generate the following SQL.

SELECT ...
  FROM "users" AS u0
  WHERE ((u0."id" > 100000) OR (u0."id" IS NULL))
  ORDER BY u0."id"
  LIMIT 10

This is the Query Plan for the SQL.

Limit  (cost=0.42..20.12 rows=10 width=2022) (actual time=1942.708..1942.741 rows=10 loops=1)
  ->  Index Scan using users_pkey on users u0  (cost=0.42..60184.00 rows=30541 width=2022) (actual time=1942.706..1942.738 rows=10 loops=1)
        Filter: ((id > 100000) OR (id IS NULL))
        Rows Removed by Filter: 89971
Planning Time: 0.599 ms
Execution Time: 1942.772 ms

From the Query Plan, the users_pkey index is used, but Filter is applied instead of Index Cond.

I customised the SQL by removing id IS NULL from it. Here's the new Query Plan.

Limit  (cost=0.42..12.14 rows=10 width=2022) (actual time=0.065..0.106 rows=10 loops=1)
  ->  Index Scan using users_pkey on users u0  (cost=0.42..35789.25 rows=30541 width=2022) (actual time=0.064..0.094 rows=10 loops=1)
        Index Cond: (id > 100000)
Planning Time: 0.588 ms
Execution Time: 0.129 ms

This time Index Cond is applied and the Execution Time improves by a lot.

I guess that the primary index does not index (and expect) any null values. Therefore, the null-checking will not hit the index condition and have to be done by filtering, which is slow.

Suggestion

One possible fix is to allow skipping the null-checking conditionally. For example, we can add a nullable? option to cursor_fields.

from(
  u in User,
  order_by: [asc: :id]
)
|> Repo.paginate(
  cursor_fields: [id: [order: :asc, nullable?: false]],
  after: "g3QAAAABdwJpZGIAAYag",
  limit: 1
)
bwan-nan commented 5 months ago

Hello!

I had the same issue and if like me you know you can't have null values in your cursor fields, try to use the option sort_direction: :asc_nulls_first to avoid making unnecessary queries. If this option is not passed, it defaults to :asc_nulls_last.

What you should try

from(
  u in User,
  order_by: [asc: :id]
)
|> Repo.paginate(
  cursor_fields: [:id],
  sort_direction: :asc_nulls_first,
  after: "g3QAAAABdwJpZGIAAYag",
  limit: 1
)

Difference on the same function between AscNullsLast and AscNullsFirst modules

defmodule Paginator.Ecto.Query.AscNullsLast do
  ...
  def build_dynamic_filter(args = %{direction: :after}) do
    dynamic(
      [{query, args.entity_position}],
      (^field_or_expr_equal(args) and ^args.next_filters) or
        ^field_or_expr_greater(args) or
        ^field_or_expr_is_nil(args)
    )
  end

  ...
end

defmodule Paginator.Ecto.Query.AscNullsFirst do
  ...
  def build_dynamic_filter(args = %{direction: :after}) do
    dynamic(
      [{query, args.entity_position}],
      (^field_or_expr_equal(args) and ^args.next_filters) or
        ^field_or_expr_greater(args)
    )
  end
  ...
end