absinthe-graphql / dataloader

DataLoader for Elixir
MIT License
489 stars 99 forks source link

Allow queries without batch keys #170

Open arnodirlam opened 5 months ago

arnodirlam commented 5 months ago

I have a use case where a query should run in dataloader without batching, together with all other (possibly batched) queries.

A few examples (see tests added in this PR):

# all entries in the table
Dataloader.load(Test, {:many, Post}, [])

# first entry in the entire table
Dataloader.load(Test, {:one, Post, limit: 1, order_by: [asc: :id]}, [])

The changes needed are minimal.

I'm happy to make another PR for the v1 branch, if and when this gets merged.

Thanks a lot for your consideration! 🙏

benwilson512 commented 5 months ago

Hi @arnodirlam can you elaborate what this accomplishes beyond calling Repo.all in your resolver?

arnodirlam commented 5 months ago

Hi @arnodirlam can you elaborate what this accomplishes beyond calling Repo.all in your resolver?

Hi @benwilson512 thanks for your quick reply!

For context, I'm using dataloader outside of GraphQL as a dependency for my library dx.

The idea here is a Repo.all with the following advantages that dataloader already brings to the table:

I'll try to make arguments in the context of running in a GraphQL resolver, though:

For example, say you have a resolver that needs to load a database table with a global set of data. Could be some kind of app-wide config. In the app I'm working on, for example, we have tables for user roles that are global. That could be exposed via GraphQL.

In that case, there is no batch key, because the data is global, but writing Repo.all in the resolver without using dataloader is a sequential ("blocking") call, i.e. it won't be run concurrently with other queries from other resolvers. Also, depending on the GraphQL schema, it might be executed multiple times, for example if the data is used to evaluate some further authorization logic.

Does that make sense?

benwilson512 commented 5 months ago

It does. My only thought from an API design standpoint is that maybe we should require that users pass in a specific special value like :root or :global or :all instead of []. My concern is that [] might just be a mistake where someone forgot to put in a batch key.

arnodirlam commented 5 months ago

I see. I might be biased here, but to me [] makes most sense.

To me, it's a list of filters, where [] means not filtered.

I see a point in helping prevent user mistakes by using :all or similar instead. However,

  1. I think, [] already is quite "far away" from [color: "red"] etc., because the latter is either passed in as Keyword list such as ... , color: "red"), then you'd have to replace all that with [] to make a mistake. Or it's generated programmatically, in which case the [] makes more sense, so users don't have to make an extra pass to replace [] with :all or similar.
  2. I don't see an enhanced need to protect users from making these kinds of mistakes, that they could make anywhere else, really. Just think of crafting an Ecto request, or calling a function, and making a mistake like forgetting to pass in a filter. That would lead to errors in most other contexts as well, and this one is not fundamentally different, in my eyes.

On another note, I'm considering crafting a PR to enable multiple filters in that list, internally detecting the most efficient key to batch on and executing the queries accordingly. For example, [color: "red", type: "special"]. Internally, it'd count which key has the highest cardinality, use that for batching, and generate queries to satisfy the other conditions. I've already implemented this in my library, and could extract and move it upstream to dataloader, if you're interested. Then the "list of filters/conditions" paradigm makes even more sense.

benwilson512 commented 5 months ago

I had not really conceptualized the batch key as filter, rather that's the job of the middle arg. The params in the middle argument are filters that apply to the entire batch to be loaded. The right most arg is what uniquely identifiers this particular item within the batch. In general you want a single value there so that the SQL is ultimately some form of WHERE $col = ANY?($ids). You can do that with multiple columns but it gets a lot uglier.

EDIT: Put another way, the argument you're calling the "batch key" is the "key of this item in the batch" rather than the key of the batch as a whole. The key of the batch as a whole within dataloader is the middle arg.

arnodirlam commented 1 month ago

Hi there, sorry for letting this linger. I'd still be interested in getting it merged. Just rebased my branch on main.

My only thought from an API design standpoint is that maybe we should require that users pass in a specific special value like :root or :global or :all instead of []. My concern is that [] might just be a mistake where someone forgot to put in a batch key.

I'd prefer either [] or :all, but I'm happy with any solution and will modify my PR accordingly. Up to you!

Thanks! ❤️