Open DaddyWarbucks opened 3 years ago
This looks great! π π Some quick thoughts:
Any specifics on why there is only one load
method vs. mirroring the service get(id, params)
and find(params)
? This may also help with specifying which elements to include when serializing the key, with [ 'query', 'authentication' ]
being the default but allowing to include additional keys with something like loader('users').get(id, params, [ 'user' ])
.
I could update the transport adapters to allow instantiating a loader per request on params
. Then a global hook could be used to redirect get
and find
calls to the loader. That way - as long as you pass params
- you may even never have to load anything explicitly:
app.hooks([
async context => {
const {
id,
method,
path,
// Pull loader and remaining params
params: { loader, ...params },
} = context;
if (loader && method === 'get') {
context.result = await loader(path).get(id, params);
}
if (loader && method === 'find') {
context.result = await loader(path).find(params);
}
}
});
Thanks for the response!
There is quite a bit of context and history on why I chose the load()
method. Mirroring the get()
and find()
methods was definitely an initial goal. Mapping load()
to get()
is pretty intuitive, but find()
is more difficult because a dataloader requires a localKey/foreignKey relationship and find does not have one. Some other things to consider:
load(id)
and loadMany([...ids])
. The loadMany
method is a simple Promise.all(load) around the array of ids.getResultsByKey
function. This type argument is generally set to '!'
(a required single object) or [!]
(an array of required results). This is abstracted in the loaderFactory
to an argument multi: boolean
indicating if the loader should return an array or object.const singleLoader = DataLoader.loaderFactory(
app.service("service"),
"id",
false // false === '!' === required sing object
);
const multiLoader = DataLoader.loaderFactory(
app.service("service"),
"id",
true // true === [!] === array of required fields
);
// We have four methods to fetch data
// singleLoaders are generally used when this record holds the foreignKey and
// generally map to the `get()` method.
singleLoader.load(post.user_id); // returns 1 required object
singleLoader.loadMany(post.user_ids); // returns array of required objects, **one object per id
// multi loaders are mostly used when the other record holds the foreignKey and
// are analogous to the `find()` method...sorta, its a find with a keyed relationship
multiLoader.load(post.id); // returns an array of objects matching this id
multiLoader.loadMany(post.categories); // returns and array of arrays, **one array per id
So there was some confusion which methods to use and it was also tedious to setup loaders for each scenario. I too thought about some way to map these methods and make sense of get, find, load, loadMany, loadManyMulti...
// load takes an id or an array of ids. It replaces both methods
// from the singleLoader
loader.load(id) // return one record
loader.load([...ids]) // return an array, one record per id
// loadMany takes an id or an array of ids.
// It replaces both methods form the multiLoader
loader.loadMany(id); // returns an array of records
loader.loadMany([...ids]); // return an array of arrays, one array per id;
That means that load()
is still analogous to get()
in the sense that you get one record per id, but you can actually get an array of results, one per id. And loadMany
is now more analogous to find()
, where you get an array of results matching the key.
Heres a more realistic example
// Common, straightforward load a "parent". Probably most used
userLoader.load({ id: post.user_id });
// Less common, but definitely still used, particularly
// with document databases
categoryLoader.load({ id: post.category_ids });
// Common, load many "children"
starsLoader.loadMany({ post_id: post.id });
// This one is weird...can't really think of a use case. Its generally
// not an id relationship but some other single property
topicsLoader.loadMany({ topic_code: post.topics });
So with the id object things, get/load returning an array, loadMany/find needing a keyed relationship...I thought it would be more confusing to try to map them to get/find and let them be totally separate. Its also important to note that there is a 1-1 map for both get(id, params, extraParams)
and find(params, extraParams)
included. These methods are cached by their key/prams but just don't go through the dataloader wizardry.
I would like to also consider actual relationship names, but I had trouble reasoning about it really. It confused me. Maybe someone has some ideas
// userLoader.load({ id: post.user_id });
userLoader.belongsTo({ id: post.user_id });
// or is it the other way around?
userLoader.hasOne({ id: post.user_id });
// I think this has potential. But its not quite right...
// the relationships used do not truly map to these either
userLoader.oneToOne();
userLoader.oneToMany();
userLoader.manyToMany();
userLoader.manyToOne();
So that where we are with names...I landed on load
and loadMany
where load === ! === on record per id
and loadMany === [!] === array of records per id
. But I am open to suggestions and its not a huge code change to change method names.
I like the suggestion of passing a "whitelist" array to the methods as third argument. That can clean some things up. I will think on that for sure.
You can also now checkout the code on the v2 branch
I would love some feedback from @marshallswain , @DesignByOnyx, @mrfrase3, @KidkArolis, @fratzinger, @robbyphillips. And of course anyone else! Just pinging ya'll to bring some awareness to this post.
Unfortunately, I don't have much to add as I do population on the client-side. But keep up the good work!
@mrfrase3 This totally works client side too π. Checkout test-feathers-client-joins which is a little test bed and thought experiment on the matter. Its a bit dated, but shows that dataloaders can be used on the client too.
I have created an example app showing how all of this works.
After some further consideration and further use in my own production apps. I have really come around on @daffl suggestion of removing the the extraParams
argument and adding a paramWhitelist
argument.
I have found myself doing this a lot:
context.params.loader('comments').load(post.id, null, { loader: context.params.loader });
// and/or
context.params.loader('comments').load(post.id, null, { transaction: context.params.transaction });
Note that we have to pass null
as the second argument because the "params to stringify" cannot contain functions, classes, etc. So whenever we are dealing with loaders, transactions, or any other function/class things get ugly. Its not SO ugly when you are actually passing some other params like this
context.params.loader('comments').load(post.id, { query: {...} }, { loader: context.params.loader });
But I just hate having to pass null
to "skip" arguments. The Dataloader is most advantageous when we pass it along to other service calls, so this problem is in direct opposition to the whole point of using Dataloader.
Instead, I am going to update the code such that it takes some config that is a "whitelist" of params to stringify into the cache key. This would probably be set to ['query']
or maybe ['query', 'user']
by default. So the above code would just be
// the stringified cache key is just the post.id
context.params.loader('comments').load(post.id, { loader: context.params.loader });
// the stringified cache key is the post.id and query
context.params.loader('comments').load(post.id, {
query: { ...the query },
loader: context.params.loader
});
// can also pass in a whitelist as third param to override default
context.params.loader('comments').load(post.id, {
query: { ...the query },
myParam: { ...my custom param }
loader: context.params.loader
},
['query', 'myParam']
);
I will like update the ServiceLoader
and LazyLoader
classes to accept this config as well. Maybe something like this
const usersLoader = new ServiceLoader(app.service('users'), {
whitelist: ['query', 'myParam'],
...other dataloader config like maxBatchSize, etc
})
const lazyLoader = new LazyLoader({
users: {
whitelist: ['query', 'myParam'],
...other dataloader config like maxBatchSize, etc
},
comments: {
whitelist: ['query', 'myParam', 'anotherParam'],
...other dataloader config like maxBatchSize, etc
}
})
Another random thought, which I think is quite exciting. I have been thinking of this mainly as a loader is created at the beginning of a service call (probably via app:before:all hook) and then manually passed to all "nested" service calls. That loader lives and dies in the course of that single request and that is an important aspect because it makes a loader different than a cache (which has to be invalidated). That is the most natural way to think of this and what most of my examples and tests reflect. But, there is no reason that multiple service call's can't use the same loader. For example,
const lazyLoader = new LazyLoader();
const loader = lazyLoader.loader;
await Promise.all([
app.service('users').get(1, { loader }),
app.service('posts').find({ query: { published: false }, loader }),
app.service('posts').find({ query: { published: true }, loader }),
]);
There are times when a developer knows that multiple "requests" can share a loader and can manually create one and pass it to all of them. This is going to offer even more performance benefit.
This would also pair well with feathers-batch
. Generally, we wouldn't use feathers batch when using a socket client because the benefits just aren't as great as using rest. We also don't use it server side (I don't think it really supports server side use at the moment). But, if server side /batch
endpoint created a loader and passed it to all of its services that its running...it would be beneficial to use it via sockets or even server side.
This looks incredible! I use current batch loader all over the place in our production app (thank you very much for it, along with fastJoin
it decreased api response times in orders of magnitude). Can't wait to try v2 out when it gets released. However, I'm wary if something of the current functionality could be gone? Also, would it be a drop-in replacement for current version batch loader, so that I could refactor my code gradually? Or at least require some small-ish changes like just changing a require
? I can't really refactor everything at once...
@1valdis The underlying BatchLoader
class will not change, except possibly be renamed to DataLoader
. So the upgrade path should be very easy. All of this new functionality is added via two new classes called ServiceLoader
and LazyLoader
. Perhaps ContextLoader
is actually a better name for LazyLoader
.
Currently, the ServiceLoader
has a method called loadMany
but it is different than the BatchLoader
method of the same name. That method actually works like a BatchLoader
that is configured to use "multi" (aka [!]
config). Previously you could not have oneBatchLoader
that handled both "single" and "multi" loading. The ServiceLoader
actually has two BatchLoader
s under the hood where load(1)
and load([1])
both call the "single" loader and loadMany(1)
and loadMany([1])
both call the "multi" loader. But I am afraid that will be confusing...so I was considering removing loadMany
from the BatchLoader
totally because it is really just syntactic sugar over Promise.all(keys.map(key => this.load(key)))
. My other thought is to change the method names on the ServiceLoader
to load
and loadMulti
. I have also considered changing them to something like hasOne
and hasMany
to better align with names we use when talking about DB relationships.
Considering your comment about the upgrade path I will not remove loadMany
from BatchLoader
and will rename loadMany
on the ServiceLoader
to something else. As someone who has lots of experience with this tool, I would really appreciate any feedback you have on it! I am using it in a production app now and it cleaned up lots of config and works great. I am confident everything works well...but am hung up on naming things. So any input is greatly appreciated!
@DaddyWarbucks as for our usage, we have abstracted batch loader creation into a factory, because, as you've mentioned, it requires quite a few settings to create (I don't say it's a bad thing! I'd rather have something I can easily customize instead of something so dumb it can do only a simple job).
Then, we currently have 58 batch loader factories on top of that one, most of which are just one factory per service, some however doing things like count+groupBy. Each request is wrapped in a transaction so all calls pass it around as well. In total I found 267 calls to .load()
method in our codebase, all in fastJoin
hooks. The project is over 3 years old but the amount of calls counts up as more functionality is added even now.
We use loadMany()
just two times, and probably won't use in any other place in future.
We also use additional query
like everywhere, and in the very basic factory we spread it like this:
{
[key]: { $in: getUniqueKeys(keys) },
...query,
}
context.params.sequelize
and paginate: false
are always in there too.
As for the questions asked in the starting post.
id
object. I think most of our usage would be through the object form anyway. It would also simplify TypeScript typing I guess.I wonder if instead of having the array of "whitelisted" params, you could have users to pass a function to generate the cache key? Remember those getResultsByKey
and getUniqueKeys
functions you have, they're as convenient as it gets. So you might take inspiration from this and do something like... (excuse me if there's something dumb)
// by default
context
.loader('users', { getCacheKey: stringifyIdAndParams(['query']) })
.load({ id: 1 })
// which is equal to
context
.loader('users')
.load({ id: 1 })
// but if we want maximum customization
context
.loader('users', { getCacheKey: (context) => { /* stringify however you want, called on load call (I suppose?) */ } })
.load({ id: 1 })
Now hypothetically you could pass an id, and a params object containing both query and everything else needed for a service call. With this sorted out to 2 arguments (id and params), you might also have a third optional argument containing batch loader specific options - including overriding getCacheKey
. Example:
const loader = await context.loader('users', {
maxBatchSize: 20,
getCacheKey: stringifyIdAndParams(['query', 'myWhitelistedParam']),
});
// cached key takes id, query, and myWhitelistedParam
const user = await loader.load(
{ id: 1 },
{
myWhitelistedParam: 'hello world',
query: { deletedAt: null },
transaction: context.params.transaction,
},
);
// key caching function is overridden so it only takes id and query
const anotherUser = await loader.load(
{ id: 2 },
{
myWhitelistedParam: 'hello world',
query: { deletedAt: null },
transaction: context.params.transaction,
},
{
getCacheKey: stringifyIdAndParams(['query']),
}
);
To me it just feels weird I can't pass params like I usually would and would have to split them. The example above doesn't require splitting things out of params, as well as passing an array each time you call load
. You'd only have to do it when you need to specifically override the cache key generator (besides, you could override something else too?).
Hopefully that helps :)
I have updated this as per some of our discussions in this thread.
extraParams
argument, the third argument is now a filterParams
argument. This can be an array of string properties or a function, with it defaulting to ['query', 'user', 'authentication']
. This is similar to how both @daffl and @1valdis recommended. For example,// The following would only keep `query`
loader('posts').load(1, { query: { published: true }, myProp: true } });
// The following would keep `query` and `myProp`
loader('posts').load(1, { query: { published: true }, myProp: true } }, ['query', 'myProp']);
// The following would aslo keep `query` and `myProp`
loader('posts').load(1, { query: { published: true }, myProp: true } }, params => {
return { query: params.query, myProp: query.myProp }
});
LazyLoader.loader()
method has been renamed to LazyLoader.service()
to better align with naming conventions. This means most of the examples above would change. The LazyLoader
is actually called ContextLoader
too. It has always been that way in code, but I called it LazyLoader
for some reason in this thread.const contextLoader = new ContextLoader(context);
contextLoader.service('users').load(1)
The ServiceLoader.loadMany()
method has been changed to ServiceLoader.loadMulti()
. I have gone back and forth on what to name this method...and there is already a BatchLoader.loadMany()
method that does a different thing. The loadMulti()
method loads multiple records per id (as per the [!]
result type)
Cache clearing has been streamlined. I am still open to some feedback on this
const contextLoader = new ContextLoader(context);
contextLoader.service('users').load(1);
contextLoader.service('users').loadMulti(1);
contextLoader.service('users').get(1);
contextLoader.service('users').find();
contextLoader.service('users').clear(); // clears all loader types
contextLoader.service('users').clear(1); // clears all loader types with id = 1
contextLoader.service('users').clear(1, { ...params }); // clears all loader types with id = 1 AND matching params
There were a couple of other little things that changed under the hood. But the most important piece of this new rewrite is the ServiceLoader
which will most often be used via a ContextLoader
. That basic API of a ServiceLoader
looks like this
class ServiceLoader {
load(idObj, params, filterParams) {
// Load 1 item per id in idObj
// Basically a hasOne relationship
}
loadMulti(idObj, params, filterParams) {
// Load N items per id in idObj
// Basically a hasMany relationship
}
get(id, params, filterParams) {
// Get cached item at id/params
}
find(params, filterParams) {
// Find cached items at params
}
clear(idObj, params, filterParams) {
// If no idObj or params provided, clear all caches
// If no idObj but params provided, clear all caches matching params
// Else if idObj and no params, clear all ids and all finds
// If both passed, clear all matching id/params and all finds
}
}
I need to add a bunch of tests and types. But I think this is generally complete. I am using a forked version of this in a production application that is also using AsyncLocalStorage
to seamlessly pass loaders to services and it is π― .
I am using a forked version of this in a production application that is also using AsyncLocalStorage to seamlessly pass loaders to services and it is π― .
Sounds neat, I'd greatly appreciate the code example π
@1valdis See: https://github.com/feathersjs/feathers/issues/2074#issuecomment-897025361 for a quick example of this paired with AsyncLocalStorage
Hey all!
This library has moved to feathers-ecosystem. Along with that move, I would like to make some updates and improvements. The library is amazing, but hasn't received the adoption I believe it deserves. I have used it in many production apps with a few different patterns and have landed on what I believe to be some nice additions to the library.
Before reading further, you should be familiar with DataLoader (also see this great video about the source code). You can also checkout the docs directory for some detailed examples of how the current implementation works and its benefits.
First things first, this package should be renamed to
feathers-dataloader
orfeathers-loader
. There is already a complimentary (yes, they are complimentary and not competitors) package feathers-batch. So to avoid confusion, this package should be renamed. For the remainder of this blurb, I will refer to the "BatchLoader" as "DataLoader" to reflect this change. Long story shortBatchLoader === DataLoader
.The Problem
Using dataloaders requires the developer to instantiate a new class every time they want to use a loader, and these classes require lots of config. This causes the developer to duplicate configuration (mainly local/foreign key relationships), It also causes the developer to know ahead of time which loaders are going to be used in any given request. For example,
This may not seem painful at first. Its explicit and can offer some great performance benefits. But, as you use more loaders and your loaders get more complicated the problem starts to get unwieldy. Because calls to
load()
are batched, params must be static. You cannot pass different params to each load. For example,Bummer, we had to create two loaders because params must be static within the class instance. This problems grows even further when you want to have variable params per
load()
call. This is not currently possible.The Solution
Create a syntax and pattern that makes using dataloaders easier and more approachable. Specifically, dataloaders should be lazily created as needed. This solution introduces two new classes
ServiceLoader
andLazyLoader
.The
ServiceLoader
lazily createsDataLoaders
for a particular service. It creates a newDataLoader
for any given set of id and params. This allows the developer to create one loader for a service, and then use any combination of local/foreign key relationships and params.The
LazyLoader
is a super simple class that lazily constructsServiceLoaders
.This accomplishes a very convenient, natural, service-like interface. The developer really doesn't need to know anything about ServiceLoaders, DataLoaders, etc, etc. If the developer knows how to use a service, they basically know how to use a loader.
Thats the general gist! I wanted to make dataloaders as convenient and approachable as possible and I think this mostly accomplishes that. I recently replaced my older implementations of loaders in a production app with this new version and it was a drop in replacement that went really well!
So some things I need feedback on
Should I remove the default key and only allow passing an "id object"? In an attempt to make this as "service like" as possible, I allowed the
load()
method to take two different shapes. You can pass just an id, or an "id object".ServiceLoader
uses a deterministic stringify function to create keys for the underlying cache map. This also means that the developer cannot put functions into the params. I want to keep the experience as service-like as possible, but it is certainly different to split your params into two arguments. For example,// This will throw an error if you pass functions, models, transactions, etc loader.load({ id: 1}, { query: { $select: ['name', 'email'] }, user, transaction, loaders, etc });
// So instead, the
load
method takes a third argumentextraParams
. These params are not // strinigified into the key but are passed to the service. This could be error prone...it could // return different results on each call even though the key is the same. The developer has // to be aware of this. In 99% of my use cases I haven't had to use it and when I have it works // as expected, but there is margin for error there. loader.load({ id: 1}, { query: { $select: ['name', 'email'] } }, { user, transaction, loaders, etc });I have found use cases for these. The
get()
method is almost always easily replaced withload()
, but there are some cases where you want to throw an error if id not found, or if the request must be aget
and not afind
(which load does under the hood) for some hooks/auth reason. The cachedfind()
definitely has value because there are cases where you simply don't have a local/foreign key relationship, but can still benefit from cachingBonus thoughts
// Full config const user = await context .loader('users', { matchBaxSize: 20, cache: false }) // you can actually pass low level DataLoader config here .load({ id: 1}, { query: { $select: ['name', 'email'] } }, { user, transaction, loaders, etc })
const user = context.params.loader('users').load(post.user_id) const comments = context.params.loader('comments').load(post.id, null, { loader: context.params.loader }); // We have already loaded some users, so when passing the loader to the comments service // which will load on their own user_id, the loader is already primed and gets some for free