Open asishallab opened 4 years ago
For some inspiring experiments see this repo
Note that the above Issue must be implemented in the following data model storage-types:
sql
cenz_server
This must include adapters in the context of distributed data models:
sql-adapter
DDM-adapter
New branch for the graphql-server: https://github.com/ScienceDb/graphql-server/tree/issue88-record-limit
Branch in the GraphQL Server Code Generator: https://github.com/ScienceDb/graphql-server-model-codegen/tree/issue88-record-limit
Add count check to the root resolvers: https://github.com/ScienceDb/graphql-server-model-codegen/commit/604a1eecc6120a2bc81efc06763ee8e21de6891b
Test for the record limit (set at 25): https://github.com/ScienceDb/graphql-server-model-codegen/commit/6cfae763e8a508af6f47115150b65130befc5979
Test adapted for the record limit 10000: https://github.com/ScienceDb/graphql-server-model-codegen/commit/536f16e02654e20136bea58113c02ab2ae702dc3
Functionality is now in the master branch (for both the GraphQL Server Code Generator - https://github.com/ScienceDb/graphql-server-model-codegen/commit/ecd6213ad43c53151ff69dc6efeb8e7d65ad0c2d and the Skeleton Server - https://github.com/ScienceDb/graphql-server/commit/6d84670fc2453eb0fab04662ab07d00e886134be) and in case of the StarterPack in the branch https://github.com/ScienceDb/ScienceDbStarterPack/tree/i25-starter-pack-react with commit https://github.com/ScienceDb/ScienceDbStarterPack/commit/8e6f879591347c41a46ee6d10d6b3df603fc72e2
A large error regarding efficiency and a case of a functionally wrong implementation was handled in https://github.com/ScienceDb/graphql-server-model-codegen/issues/122
The problem
Imagine we have our classical Person has Dogs has Fleas models, the global constant maximum limit of records is 10,000 (10 k), and we fetch 10 k persons where each of them has 10 k dogs where in turn each dog has 10 k fleas. Assuming that a records without associations is roughly 20 Bytes in size we would need approximately 20 Terrabyte of memory to process this.
(1e4 ^ 3 * 20) / (1024^3) = 18626.45
We need to fix this possible open door for malicious clients.
First extend our GraphQL
context
as followsThen in each resolver implement the following behavior
context.recordsLimit
throw an error 1.b Else fetch the matching records into sayresultRecords
nr
) and the currentcontext.recordsLimit
2.a Ifnr
exceedscontext.recordsLimit
throw an error 2.b Else subtract fromcontext.recordsLimit
nr
and return theresultRecords
Verify, if the above behavior is possible to implement in the resolver and not the model layer. If not, pass
context.recordsLimit
as an argument to the model layer, and implement the above behavior there.The parallel resolver problem
Imagine we have the following schema: A Person has Dogs and also has Parrots. When using the following GraphQL Query:
GraphQL invokes three resolvers separately. First the root resolver
persons
is invoked. Next, the functionsdogs
andparrots
inside eachperson
object are invoked asynchronously, i.e. "to some degree in parallel". We cannot know in what order both async processes are carried out, and when they actually accesscontext.recordsLimit
and especially when they diminish its value by the number of fetched records. But on the other hand we do not want to enforce sequential execution (and hack GraphQL). Thus in theory the following scenario is possible. Bothdogs
andparrots
check the current value ofcontext.recordsLimit
before diminishing it, thus find that they can return their respective fetched dogs and parrots, and then diminishcontext.recordsLimit
. In this rare case therecordsLimit
will actually not hold. However, because any subsequent fetch will blow up the limit, this scenario most likely does not pose a thread to efficiency. Hence the implementation seems the right way to go.