absinthe-graphql / dataloader

DataLoader for Elixir
MIT License
492 stars 99 forks source link

Multiple resolutions resulting in duplicate DB calls #160

Closed djeusette closed 1 year ago

djeusette commented 1 year ago

Hi there,

First of all, thank you for your amazing work. Dataloader is definitely game changer!

For some GraphQL queries, the Absinthe.Middleware.Dataloader calls multiple times Dataloader.run, via the before_resolution function I guess, potentially triggering multiple identical Ecto queries.

Here is a GraphQL query example:

query Activity($id: UUID!, $after: String, $before: String, $first: Int, $last: Int) {
      activity(id: $id) {
        ...Activity
        organizerName
        organizer {
          id
          lastName
          firstName    
        }
        attendances(after: $after, before: $before, first: $first, last: $last) {
          pageInfo {        
            endCursor
            hasNextPage
            hasPreviousPage
            startCursor
          }
          count
          edges {
            node {
              id
              attendanceConfirmed
              attendee {
                avatar {
                  url
                  name
                }
                id
                firstName
              }
            }
          }
        }
      }
    }
    fragment Activity on Activity {
      hasPendingAvailableSeatAlert
      price {
        amount
        currency
      }
      picture {
        id
        url
      }
      categories {
        id
        name
      }
      type
      name
      maxAttendees
      location  
      id
      headline
      startTime
      duration
      description
      registered
      scene
      mobileExclusive
      dynamicLink
    }

In the above query, both organizer and attendee resolve to Colette.Accounts.User structs. In both cases, I rely on Dataloader to fetch the data.

I use the documented way to load batches, run the loader and get the results. Here is the piece of code:

def load_and_get(
        %Dataloader{} = loader,
        source,
        batch_key,
        item_key,
        block,
        opts \\ []
      )
      when is_function(block, 3) do
    loader
    |> Dataloader.load(
      source,
      batch_key,
      item_key
    )
    |> IO.inspect(label: "loader")
    |> on_load(fn loader ->
      IO.inspect(label: "AFTER ON LOAD")

      records =
        Dataloader.get(
          loader,
          source,
          batch_key,
          item_key
        )

      block.(records, extract_args_from_batch_key(batch_key), opts)
    end)
  end

Here are the interesting logs when resolving the aforementioned Graphql query:

1/ The loader with several batches before the first run. Please, note the batch to fetch users {:queryable, #PID<0.2009.0>, Colette.Accounts.User, :one, :id, %{}}

loader: %Dataloader{
  sources: %{
    ColetteWeb.Dataloaders.GenericEcto => %Dataloader.Ecto{
      repo: Colette.Repo,
      query: #Function<0.115027113/2 in ColetteWeb.Dataloaders.GenericEcto.query>,
      run_batch: #Function<1.115027113/5 in ColetteWeb.Dataloaders.GenericEcto.run_batch>,
      repo_opts: [],
      batches: %{
        {:assoc, Colette.Communities.Activity, #PID<0.2009.0>, :categories,
         Colette.Communities.Category,
         %{}} => MapSet.new([
          {["69da3613-ba71-449b-b432-45538de62909"],
           %Colette.Communities.Activity{
             ...
           }}
        ]),
        {:queryable, #PID<0.2009.0>, Colette.Accounts.User, :one, :id, %{}} => MapSet.new([
          {"03d7f3ce-f308-41c6-abe4-ffa13174ec77",
           "03d7f3ce-f308-41c6-abe4-ffa13174ec77"}
        ]),
        {:queryable, #PID<0.2009.0>, Colette.Communities.ActivityAttendance,
         :many, :activity_id,
         %{
           query_fun: {&Colette.Communities.ActivityAttendances.Query.apply/2,
            %{filters: %{limit: 51, offset: 0}, order: [desc: :inserted_at]}}
         }} => MapSet.new([
          {"69da3613-ba71-449b-b432-45538de62909",
           "69da3613-ba71-449b-b432-45538de62909"}
        ])
      },
      results: %{},
      default_params: %{},
      options: []
    },
    ColetteWeb.Dataloaders.GenericKv => %Dataloader.KV{
      load_function: #Function<0.39766397/2 in ColetteWeb.Dataloaders.GenericKv.load>,
      opts: [max_concurrency: 24, timeout: 30000],
      batches: %{},
      results: %{}
    }
  },
  options: [get_policy: :raise_on_error]
}
[label: "AFTER ON LOAD"]
[label: "AFTER ON LOAD"]
[label: "AFTER ON LOAD"]
[label: "AFTER ON LOAD"]

2/ Right before the next run with new user batches

loader: %Dataloader{
  sources: %{
    ColetteWeb.Dataloaders.GenericEcto => %Dataloader.Ecto{
      repo: Colette.Repo,
      query: #Function<0.115027113/2 in ColetteWeb.Dataloaders.GenericEcto.query>,
      run_batch: #Function<1.115027113/5 in ColetteWeb.Dataloaders.GenericEcto.run_batch>,
      repo_opts: [],
      batches: %{
        {:queryable, #PID<0.2009.0>, Colette.Accounts.User, :one, :id, %{}} => MapSet.new([
          {"0e81b4f6-d80f-48a4-947d-0c0dbdd6e4db",
           "0e81b4f6-d80f-48a4-947d-0c0dbdd6e4db"},
          {"a5cc29e2-8778-4306-951f-ea96461849a0",
           "a5cc29e2-8778-4306-951f-ea96461849a0"},
          {"d435e205-a058-4b6c-bea1-30188cf2e907",
           "d435e205-a058-4b6c-bea1-30188cf2e907"}
        ]),
        {:queryable, #PID<0.2009.0>, Colette.Communities.ActivityAttendance,
         :many, {:activity_id, :count},
         %{
           query_fun: {&Colette.Communities.ActivityAttendances.Query.filtered_by/2,
            %{}}
         }} => MapSet.new([
          {"69da3613-ba71-449b-b432-45538de62909",
           "69da3613-ba71-449b-b432-45538de62909"}
        ])
      },
      results: %{
        {:assoc, Colette.Communities.Activity, #PID<0.2009.0>, :categories,
         Colette.Communities.Category,
         %{}} => {:ok,
         %{
           ["69da3613-ba71-449b-b432-45538de62909"] => [
             %Colette.Communities.Category{
               ...
             },
             %Colette.Communities.Category{
               ...
             }
           ]
         }},
        {:queryable, #PID<0.2009.0>, Colette.Accounts.User, :one, :id, %{}} => {:ok,
         %{
           "03d7f3ce-f308-41c6-abe4-ffa13174ec77" => #Colette.Accounts.User<
             ...
           >
         }},
        {:queryable, #PID<0.2009.0>, Colette.Communities.ActivityAttendance,
         :many, :activity_id,
         %{
           query_fun: {&Colette.Communities.ActivityAttendances.Query.apply/2,
            %{filters: %{limit: 51, offset: 0}, order: [desc: :inserted_at]}}
         }} => {:ok,
         %{
           "69da3613-ba71-449b-b432-45538de62909" => [
             %Colette.Communities.ActivityAttendance{
               ...
             },
             %Colette.Communities.ActivityAttendance{
               ...
             },
             %Colette.Communities.ActivityAttendance{
               ...
             }
           ]
         }}
      },
      default_params: %{},
      options: []
    },
    ColetteWeb.Dataloaders.GenericKv => %Dataloader.KV{
      load_function: #Function<0.39766397/2 in ColetteWeb.Dataloaders.GenericKv.load>,
      opts: [max_concurrency: 24, timeout: 30000],
      batches: %{},
      results: %{}
    }
  },
  options: [get_policy: :raise_on_error]
}
[label: "AFTER ON LOAD"]
[label: "AFTER ON LOAD"]

Multiple other runs happen right after.

All in all, this results in one Ecto query being called two times with different IDs:

SELECT u0."id", ... FROM "users" AS u0 WHERE (u0."id" = ANY($1))

Considering, the organizer is not needed to fetch the attendances and then the attendees, is there any way to prevent different runs on some batches?

Thanks!

benwilson512 commented 1 year ago

Right so this just means that the batches are accumulated in different passes on the document due to some prior async loading. In your specific case to simplify your query you have:

query Activity($id: UUID!, $after: String, $before: String, $first: Int, $last: Int) {
      activity(id: $id) {
        # This happens in pass 1
        organizer {
          id 
        }
        # This happens in pass 1
        attendances(after: $after, before: $before, first: $first, last: $last) {
          edges {
            node {
              # This happens in pass 2
              attendee {

Based on what I see in your logs it looks like both the organizer field and the attendances field use Dataloader. Absinthe runs through the document in its first pass and accumulates batches for those two things, and then it executes them. It can't possibly add attendee to the first pass because attendances hasn't resolved yet.

After that first pass Absinthe goes back through, fills in the loaded data, and can proceed to accumulate any new batches further down the tree.