cksac / dataloader-rs

Rust implementation of Facebook's DataLoader using async-await.
Apache License 2.0
261 stars 23 forks source link

Specify fields in key, or allow separate load/cache keys #31

Closed oeed closed 2 years ago

oeed commented 3 years ago

I am wanting to only load requested fields from the database. For example, take this GraphQL type:

type Person {
  id
  name
  birthday
}

With the following request:

person {
  id
  name
}

Rather than doing SELECT * FROM people, I want to only get the id and name fields; not birthday.

It seems that in the official Node based module they recommend using separate cache and load keys. This means BatchFn might use (PersonID, Vec<String>) as its ID, but Loader would use PersonID. Multiple load keys map to one cache key; so it'd be up to the BatchFn implementation to dedup the person/fields combinations.

cksac commented 3 years ago

I have few questions unresolved,

  1. In case the graphql query contains multiple segments each require different set of fields, e.g. (PersonId(1), vec!["id"]), (PersonId(1), vec!["id", "name"]) both will return rust struct Person which container all the fields, how batchFn to be implemented?
    
    pub struct Person {
    id: String,
    name: String,
    birthday: String
    }

impl BatchFn<(PersonID, Vec), Person> for PersonBatcher { ... }


2. If (PersonId(1), vec!["id"]), (PersonId(1), vec!["id", "name"]) will resolved to same cache, then it will fire two queries to DB?
e.g. select id from person,  select id, name from person
there has multiple queries for same PersonId, and merged result into single cache, in case there have update in between the queries, the cache may have inconsistency 
oeed commented 3 years ago

I think you'd somehow want to merge the requested fields, so if you have a request for PersonId(1) with id and birthday, and another PersonId(1) with id and name you'd merge those in to one request PersonId(1) with id, name and birthday. You'd definitely want it to only query once for the merged request.

So, this requested PersonId(1) with id, name and birthday would fill the cache for PersonId(1) with this data - somewhat like Apollo client does. Then both requests would get the same data back with all 3 fields (i.e. you'd be guaranteed that your requested fields would be populated, but fields you didn't request might also be populated).

Because you might have partial data you'd probably need to implement Default on the struct to fill in missing fields (such as making non-requested fields None).


As I've been think about this the past few days I've come to realise this adds a large degree of complexity for (in most cases) fairly minimal performance gains. Therefore I think the feature is probably unnecessary (at least for my use cases) in hindsight, thanks for your thoughts either way though.

Alternatively, it might be a feature that'd be better suited to a different crate. Although for such a crate to be possible I believe changing the constraint on loader C: Cache<Key = K, Val = V> to use a different key (which by default would = K) would be necessary.