Zendro-dev / graphql-server-model-codegen

Command line utility to auto-generate the structure files for a graphql server
MIT License
1 stars 2 forks source link

Fix limit of records allowed to be fetched in a single query / request #88

Open asishallab opened 4 years ago

asishallab commented 4 years ago

The problem

Imagine we have our classical Person has Dogs has Fleas models, the global constant maximum limit of records is 10,000 (10 k), and we fetch 10 k persons where each of them has 10 k dogs where in turn each dog has 10 k fleas. Assuming that a records without associations is roughly 20 Bytes in size we would need approximately 20 Terrabyte of memory to process this.

(1e4 ^ 3 * 20) / (1024^3) = 18626.45

We need to fix this possible open door for malicious clients.

First extend our GraphQL context as follows

app.use('/graphql', cors(), graphqlHTTP((req) => ({
  schema: Schema,
  rootValue: resolvers,
  context: {
    recordsLimit: globals.RECORD_LIMIT
 // ...

Then in each resolver implement the following behavior

  1. Count the records matching the current search arguments. 1.a If the count exceeds context.recordsLimit throw an error 1.b Else fetch the matching records into say resultRecords
  2. In order to take into account the parallel resolver problem (see below) compare again the number of fetched records (nr) and the current context.recordsLimit 2.a If nr exceeds context.recordsLimit throw an error 2.b Else subtract from context.recordsLimit nr and return the resultRecords

Verify, if the above behavior is possible to implement in the resolver and not the model layer. If not, pass context.recordsLimit as an argument to the model layer, and implement the above behavior there.

The parallel resolver problem

Imagine we have the following schema: A Person has Dogs and also has Parrots. When using the following GraphQL Query:

{ persons {
  name
  dogs {
    name
    age
  }
  parrots {
    name
    species
  }
}

GraphQL invokes three resolvers separately. First the root resolver persons is invoked. Next, the functions dogs and parrots inside each person object are invoked asynchronously, i.e. "to some degree in parallel". We cannot know in what order both async processes are carried out, and when they actually access context.recordsLimit and especially when they diminish its value by the number of fetched records. But on the other hand we do not want to enforce sequential execution (and hack GraphQL). Thus in theory the following scenario is possible. Both dogs and parrots check the current value of context.recordsLimit before diminishing it, thus find that they can return their respective fetched dogs and parrots, and then diminish context.recordsLimit. In this rare case the recordsLimit will actually not hold. However, because any subsequent fetch will blow up the limit, this scenario most likely does not pose a thread to efficiency. Hence the implementation seems the right way to go.

asishallab commented 4 years ago

For some inspiring experiments see this repo

asishallab commented 4 years ago

Note that the above Issue must be implemented in the following data model storage-types:

This must include adapters in the context of distributed data models:

tmvoe commented 4 years ago

New branch for the graphql-server: https://github.com/ScienceDb/graphql-server/tree/issue88-record-limit

tmvoe commented 4 years ago

Branch in the GraphQL Server Code Generator: https://github.com/ScienceDb/graphql-server-model-codegen/tree/issue88-record-limit

Add count check to the root resolvers: https://github.com/ScienceDb/graphql-server-model-codegen/commit/604a1eecc6120a2bc81efc06763ee8e21de6891b

Test for the record limit (set at 25): https://github.com/ScienceDb/graphql-server-model-codegen/commit/6cfae763e8a508af6f47115150b65130befc5979

Test adapted for the record limit 10000: https://github.com/ScienceDb/graphql-server-model-codegen/commit/536f16e02654e20136bea58113c02ab2ae702dc3

tmvoe commented 4 years ago

Functionality is now in the master branch (for both the GraphQL Server Code Generator - https://github.com/ScienceDb/graphql-server-model-codegen/commit/ecd6213ad43c53151ff69dc6efeb8e7d65ad0c2d and the Skeleton Server - https://github.com/ScienceDb/graphql-server/commit/6d84670fc2453eb0fab04662ab07d00e886134be) and in case of the StarterPack in the branch https://github.com/ScienceDb/ScienceDbStarterPack/tree/i25-starter-pack-react with commit https://github.com/ScienceDb/ScienceDbStarterPack/commit/8e6f879591347c41a46ee6d10d6b3df603fc72e2

tmvoe commented 4 years ago

A large error regarding efficiency and a case of a functionally wrong implementation was handled in https://github.com/ScienceDb/graphql-server-model-codegen/issues/122