Provide a new lifecycle method to hook on via an ApolloServerPlugin to pass a GraphQL schema on-the-fly to the ApolloServer.

andreloeffelmann commented 1 year ago

Our situation:

We have one single ApolloServer and a single database backing the server. So, Apollo Router or Federation Gateway is not an option for us.

The usecase:

Although we have just one database, we have different 'domains' or 'namespaces' within our GraphQL schema. Not every user is allowed to see / execute everything. We do NOT want to disable introspection in production environment because it is a good way to learn what the API provides for externals. Also, code generation for clients should be possible according to the introspection result. Each API consumer has to authorize by provding e.g. a token within each request. Depending on that, we decide what he is allowed to see / execute from our GraphQL schema.

So, we need to change the used schema of the ApolloServer on each request depending on the requesting user.

Currently we do this within the context() function by calling:

// Set up our Express middleware to handle CORS, body parsing,
// and our expressMiddleware function.
app.use(
    contextPath,
    cors(),
    bodyParser.json(),
    // expressMiddleware accepts the same arguments:
    // an Apollo Server instance and optional configuration options
    expressMiddleware(apolloServer, {
      context: async ({ req }) => {
         let newSchema = // get new schema here
         apolloServer.internals.state.schemaManager.processSchemaLoadOrUpdateEvent({apiSchema:newSchema }, ApolloServer.generateSchemaDerivedData(newSchema));
      }
    })
}

But this is clearly a bad hack and not intended to do.

Our request:

Provide a new lifecycle method to hook on to pass a schema on-the-fly to the ApolloServer. This plugin method could be called right before runPotentiallyBatchedHttpQuery is called within ApolloServer.ts. At this point the schema is passed as an argument to this method by invoking runningServerState.schemaManager.getSchemaDerivedData().

Maybe there is another good option / solution we didn't think of?

trevor-scheer commented 1 year ago

We have some recommended approaches for authz in the docs, have you explored these options? Is there an additional constraint in your project that's led you to want to change the schema on the fly?

Before we actually dig in to whether this is a good idea or not I want to make sure I fully understand the scope of your problem and if we can't address it in a more conventional way.

My initial reaction is that swapping out N schemas on a per request basis seems heavy-handed even with a proper API for it (I definitely recommend against the approach you've provided but I understand for now it's a means to an end, i.e. do you really want to run validateSchema() on every request?).

andreloeffelmann commented 1 year ago

I explored the several approaches for auth in the docs, yes. So far, the auth directive is really great to add fine-granular restriction to types and fields.

However, none of these approaches target introspection queries (and results). An example: following the type definitions here only a user having the role 'ADMIN' is able to query the field 'banned' from the 'User' type. But: ALL users will see that there is this field 'banned' on the 'User' type when invoking an introspection query - even without the 'ADMIN' role. And this is not just for fields. Imagine you have a "secret" type 'ServerHealthStats' which only admins are allowed to query. Every user invoking an introspection query would see that there is this type. Not everyone could query it, but the sole fact that everyone can see that the type is existing is bad. Same goes for mutations. I do not want users to see how or which types / fields can be altered with mutations if they are not allowed to execute mutations at all.

We want to have a transparent API to several user groups. Some may see everything, some may see only a subset of the schema. Some may be allowed to alter stuff (and thus, see mutations), some may not.

The only way I was able to achieve that requirement is the solution I provided in the first post: generating n schemas having exactly those types/fields/mutations for a specific role and changing the used schema at runtime for each request depending on the incoming user's rights/roles before the request pipeline chain from apollo server kicks in. By doing this, the introspection results show exactly that stuff, a specifc user is allowed to see.

I absolutely agree that this solution is dirty. But I could not find a better one so far.

Regarding the performance: schema switching currently takes 4-8ms which is OK for us (for having the problem solved in exchange). I do not want to run validateSchema() on every request. But since there seems to be no API in apollo server to change the schema (without router/gateway usage), this:

apolloServer.internals.state.schemaManager.processSchemaLoadOrUpdateEvent({apiSchema:newSchema }, ApolloServer.generateSchemaDerivedData(newSchema));

was the only way I was able to change it (figured that out by digging deep into the apollo code).

I would be totaly fine with validating/loading n schemas at server startup and just swapping them out without re-validation on each request.

eugene1g commented 1 year ago

@andreloeffelmann I'm looking at our options for managing runtime schemas, and found your take on this. However, it seems to me that there is a serious risk of a race condition inherent in mutating a global shared value within the Apollo Server singleton instance. For example, if two HTTP requests arrive from two different users in close proximity, it's possible that Apollo runs context for User1, then for User2, and only then gets to the GraphQL execution step for User1 (as there are many awaits in the request handling flow). In this case execution for User1 will user the schema state defined for User2. Am I missing something about how Apollo isolates requests, or maybe apolloServer.internals.state is not actually a shared state?

andreloeffelmann commented 1 year ago

@eugene1g this a concern I was thinking about also. As far as I understand the code, apolloServer.internals.state is indeed shared for all requests. But: whilst debugging I got to this conclusion:

in ApolloServer.js the context() function is called.
it is indeed called async but I am switching the schema before immediately returning the function
then, as a next step, this is called: return await runPotentiallyBatchedHttpQuery(this, httpGraphQLRequest, contextValue, runningServerState.schemaManager.getSchemaDerivedData(), this.internals);
from now the request chain has its "own" schema, given via runningServerState.schemaManager.getSchemaDerivedData()
there is no access to apolloServer.internals.state anymore further down (as far as I could see).
so there are no blocking or long-running tasks between the step where I switch the schema and where the schema is accessed from the state and passed to the request chain.

Here is an excerpt of the code:

let contextValue;
try {
    contextValue = await context(); // <- this is where I switch the schema
}
catch (maybeError) {
    const error = ensureError(maybeError);
    try {
        await Promise.all(this.internals.plugins.map(async (plugin) => plugin.contextCreationDidFail?.({
            error,
        })));
    }
    catch (pluginError) {
        this.logger.error(`contextCreationDidFail hook threw: ${pluginError}`);
    }
    return this.errorResponse(ensureGraphQLError(error, 'Context creation failed: '), httpGraphQLRequest);
}
return await runPotentiallyBatchedHttpQuery(this, httpGraphQLRequest, contextValue, runningServerState.schemaManager.getSchemaDerivedData(), this.internals); // <- this is where the schema is accessed

So I guess there shouldn't be any concurrency issues or am I missing something?

trevor-scheer commented 1 year ago

@andreloeffelmann thanks for the additional details.

I know you led with "Federation is not an option since we only have one service", so it sounds like you might already be aware of our Contracts feature in GraphOS? For what it's worth, we have plenty of customers running a single subgraph behind a router in order to enjoy the benefits of the Federation ecosystem (with more features coming to Router).

This really sounds like you'd want to have a dedicated AS instance running per schema. What about solving this problem at the routing layer by routing to various AS instances? I can only imagine it'd be a simpler approach.

Quite frankly, I don't think this is a great idea to begin with and I'm really not keen on introducing additional complexity to the request lifecycle to support this.

trevor-scheer commented 1 year ago

I'm going to close this as won't fix but happy to continue discussing options here.

apollographql / apollo-server