Fix limit of records allowed to be fetched in a single query / request

asishallab commented 4 years ago

The problem

Imagine we have our classical Person has Dogs has Fleas models, the global constant maximum limit of records is 10,000 (10 k), and we fetch 10 k persons where each of them has 10 k dogs where in turn each dog has 10 k fleas. Assuming that a records without associations is roughly 20 Bytes in size we would need approximately 20 Terrabyte of memory to process this.

(1e4 ^ 3 * 20) / (1024^3) = 18626.45

We need to fix this possible open door for malicious clients.

First extend our GraphQL `context` as follows

app.use('/graphql', cors(), graphqlHTTP((req) => ({
  schema: Schema,
  rootValue: resolvers,
  context: {
    recordsLimit: globals.RECORD_LIMIT
 // ...

Then in each resolver implement the following behavior

Count the records matching the current search arguments. 1.a If the count exceeds context.recordsLimit throw an error 1.b Else fetch the matching records into say resultRecords
In order to take into account the parallel resolver problem (see below) compare again the number of fetched records (nr) and the current context.recordsLimit 2.a If nr exceeds context.recordsLimit throw an error 2.b Else subtract from context.recordsLimit nr and return the resultRecords

Verify, if the above behavior is possible to implement in the resolver and not the model layer. If not, pass context.recordsLimit as an argument to the model layer, and implement the above behavior there.

The parallel resolver problem

Imagine we have the following schema: A Person has Dogs and also has Parrots. When using the following GraphQL Query:

{ persons {
  name
  dogs {
    name
    age
  }
  parrots {
    name
    species
  }
}

GraphQL invokes three resolvers separately. First the root resolver persons is invoked. Next, the functions dogs and parrots inside each person object are invoked asynchronously, i.e. "to some degree in parallel". We cannot know in what order both async processes are carried out, and when they actually access context.recordsLimit and especially when they diminish its value by the number of fetched records. But on the other hand we do not want to enforce sequential execution (and hack GraphQL). Thus in theory the following scenario is possible. Both dogs and parrots check the current value of context.recordsLimit before diminishing it, thus find that they can return their respective fetched dogs and parrots, and then diminish context.recordsLimit. In this rare case the recordsLimit will actually not hold. However, because any subsequent fetch will blow up the limit, this scenario most likely does not pose a thread to efficiency. Hence the implementation seems the right way to go.