ash-project / ash_json_api_wrapper

MIT License
15 stars 5 forks source link

Pagination using limit and offset query parameters #5

Open peterhartman-90poe opened 1 year ago

peterhartman-90poe commented 1 year ago

I have an API that achieves pagination using query parameters in the URL. The maximum items returned is 50 so an offset/skip is needed to retrieve the full dataset

Is this possible using the library as-is? I have hacked it for now by adding a "skip" parameter to the resource that gets passed as a filter

read(:list_companies) do
    argument :skip, :integer do
        allow_nil? true
        default 0
    end

    filter expr(skip == ^arg(:skip))
end

The limit is working nicely as-is:

limit_with {:param, "limit"}
zachdaniel commented 1 year ago

You should be able to do this with the "paginator" logic.

You would define an AshJsonApiWrapper.Paginator, and then use that in an endpoint, i.e paginator YourPaginator. The idea is that you'd specify offset and limit as normal and it would page through until it had as many as you've requested. Then, you could do something like this in the action:

read(:list_companies) do
  pagination do
    offset? true
    required? true
    default_limit 50
  end
end

And then you could use Ash's builtin pagination which would map back to the API's pagination.

peterhartman-90poe commented 1 year ago

Unfortunately in the process of trying to build my own paginator I keep getting

** (KeyError) key :runtime_sort? not found in: nil

Attached is a livebook that recreates the problem: paginate-20231024.livemd.zip

I have hit a wall trying to debug this myself unfortunately. Here is the output of dbg() around sort() in the datalayer

[(ash_json_api_wrapper 0.1.0) lib/data_layer/data_layer.ex:323: AshJsonApiWrapper.DataLayer.sort/3]
query #=> %AshJsonApiWrapper.DataLayer.Query{
  api: Api,
  context: %{
    private: %{authorize?: false, in_before_action?: true},
    action: %Ash.Resource.Actions.Read{
      arguments: [],
      description: nil,
      filter: nil,
      get_by: [],
      get?: false,
      manual: nil,
      metadata: [],
      modify_query: nil,
      name: :list_users,
      pagination: %Ash.Resource.Actions.Read.Pagination{
        default_limit: 50,
        max_page_size: 250,
        countable: false,
        required?: true,
        keyset?: false,
        offset?: true
      },
      preparations: [],
      primary?: true,
      touches_resources: [],
      transaction?: false,
      type: :read
    },
    authorize?: false,
    actor: nil,
    query_opts: [
      verbose?: false,
      actor: nil,
      authorize?: false,
      page: nil,
      return_query?: false
    ],
    page_opts: [limit: 50],
    initial_query: #Ash.Query<resource: Users, sort: [id: :asc], limit: 251>,
    filter_requests: [],
    initial_limit: nil,
    initial_offset: 0
  },
  headers: [],
  action: %Ash.Resource.Actions.Read{
    arguments: [],
    description: nil,
    filter: nil,
    get_by: [],
    get?: false,
    manual: nil,
    metadata: [],
    modify_query: nil,
    name: :list_users,
    pagination: %Ash.Resource.Actions.Read.Pagination{
      default_limit: 50,
      max_page_size: 250,
      countable: false,
      required?: true,
      keyset?: false,
      offset?: true
    },
    preparations: [],
    primary?: true,
    touches_resources: [],
    transaction?: false,
    type: :read
  },
  limit: nil,
  offset: 0,
  filter: nil,
  runtime_filter: nil,
  path: "https://65383945a543859d1bb1528e.mockapi.io/api/v1",
  query_params: %{},
  body: nil,
  sort: nil,
  endpoint: nil,
  templates: nil,
  override_results: nil
}
zachdaniel commented 1 year ago

I haven't looked into it too deeply, but I believe what we need to do is set an endpoint on the query in the set_context callback. Something like this:

endpoint: AshJsonApiWrapper.DataLayer.Info.endpoint(resource, action.name),
zachdaniel commented 1 year ago

The sort callback is expecting that to be there, but nothing is setting it into the query currently.

peterhartman commented 1 year ago

I think I have resolved :runtime_sort? not found in: nil issue with https://github.com/ash-project/ash_json_api_wrapper/pull/6

I am still forced to add :runtime_sort? true to the endpoint if I add pagination. Is that expected? Without it I get * Sorting is not supported

It will mean that I always make a request for all pages even if the limit is much lower which will be expensive.

zachdaniel commented 1 year ago

Hmm...It depends on if we've added support for mapping sort parameters or not. If we have, then we can do as much of the sorting on the endpoint as possible. If not we'll need to add that first.

peterhartman commented 1 year ago

To check my understanding, Ash itself is forcing sorting (of some kind) when you paginate to ensure the same record doesn't appear in 2 different pages? eg because data is added in between calls to the data source

zachdaniel commented 1 year ago

yep. I think we could potentially eliminate that requirement though. Sorting while paginating is only required in certain contexts, and technically limit/offset pagination does not require sorting. It just leads to potentially inconsistent results later based on arbitrary changes, instead of something reasonable.

zachdaniel commented 1 year ago

This logic:

  defp do_paginate(query, pagination, opts) do
    # We want to make 100% sure that there is a stable sort at the end
    # of the sort for pagination
    query =
      if Ash.Actions.Sort.sorting_on_identity?(query) do
        query
      else
        Ash.Query.sort(query, Ash.Resource.Info.primary_key(query.resource))
      end

    paginated =
      cond do
        opts[:page][:before] || opts[:page][:after] ->
          keyset_pagination(query, pagination, opts[:page])

        opts[:page][:offset] ->
          limit_offset_pagination(query, pagination, opts[:page])

        pagination.offset? && pagination.keyset? ->
          keyset_pagination(query, pagination, opts[:page])

        pagination.offset? ->
          limit_offset_pagination(query, pagination, opts[:page])

        true ->
          keyset_pagination(query, pagination, opts[:page])
      end

    case paginated do
      {:ok, initial_query, query} ->
        if opts[:page][:filter] do
          {:ok, Ash.Query.filter(initial_query, ^opts[:page][:filter]),
           Ash.Query.filter(query, ^opts[:page][:filter])}
        else
          {:ok, initial_query, query}
        end

      {:error, error} ->
        {:error, error}
    end
  end

in Ash.Actions.Read is where we are doing this. My first thought was to ask the data layer if it supports sorting, but in this case it does support sorting, it just has a significant cost 😆. We could ask the data layer if "sorting on primary key is cheap" like it is with postgres? Something like Ash.DataLayer.can?(resource, :cheap_primary_key_sort? ). We'd need to add that capability to each existing data layer though...

peterhartman commented 1 year ago

I think we also need a new callback for the paginator behaviour that allows the initial page to be requested, something like:

defmodule AshJsonApiWrapper.Paginator do
  @moduledoc """
  Behavior for scanning pages of a paginated endpoint.
  """

  @type ref :: {module, Keyword.t()}

  defmacro __using__(_) do
    quote do
      @behaviour AshJsonApiWrapper.Paginator
    end
  end

# vvvvvv
  @callback start(
              opts :: Keyword.t()
            ) :: {:ok, %{optional(:params) => map, optional(:headers) => map}}
# ^^^^^^

  @callback continue(
              response :: term,
              entities :: [Ash.Resource.record()],
              opts :: Keyword.t()
            ) :: {:ok, %{optional(:params) => map, optional(:headers) => map}} | :halt
end
zachdaniel commented 1 year ago

Yeah, that makes sense 👍

peterhartman commented 1 year ago

While testing I would expect the following to pass:

users =
      Users
      |> Ash.Query.for_read(:list_users)
      |> Ash.Query.limit(2)
      |> Api.read!()

users2 =
      Users
      |> Ash.Query.for_read(:list_users)
      |> Api.read!(page: [limit: 2, offset: 1])

users_count = users.results |> Enum.count()
users2_count = users2.results |> Enum.count()

assert(users_count == users2_count)

but it fails (possibly/probably be cause of this: https://github.com/ash-project/ash/blob/780eae8d69b750992515470d95003ad76c540956/lib/ash/actions/read.ex#L2778C1-L2779C1)

Assertion with == failed
     code:  assert users_count == users2_count
     left:  3
     right: 2

Why do we add one to the limit in the first case?

zachdaniel commented 1 year ago

🤔 something is interesting there. So we do add one to the limit when paginating so we can tell you if there are more results. But we should also remove that extra one.

zachdaniel commented 1 year ago

Okay, I believe I've fixed the issue you found in core where we are including the extra result in the set. We still need to do something to remove the requirement on sorting for pagination on a case by case basis I believe.