apollographql / apollo-server

🌍  Spec-compliant and production ready JavaScript GraphQL server that lets you develop in a schema-first way. Built for Express, Connect, Hapi, Koa, and more.
https://www.apollographql.com/docs/apollo-server/
MIT License
13.79k stars 2.03k forks source link

Serverless Subscriptions #2129

Closed schickling closed 3 years ago

schickling commented 5 years ago

During last the AWS Reinvent event they announced WebSockets support for API Gateway in combination with AWS Lambda. (See example using the Serverless framework.)

This opens the door for Serverless Subscriptions™️ which means instead of running stateful long-running servers, you can use Lambda functions to handle connections and process messages. Making this possible will require us to rethink how subscriptions are being implemented and deployed.

I'd like to open up a discussion and brainstorm API design ideas. 🚀

ErickWendel commented 5 years ago

Great idea!

michalkvasnicak commented 5 years ago

I just want to publish here my solution which could provide you with some sort of a base to start implementing :) https://github.com/michalkvasnicak/aws-lambda-graphql

Also @schickling had good question about the need of storage https://spectrum.chat/graphql/general/graphql-subscriptions-over-websockets-with-aws-api-gateway-v2-and-aws-lambda~4cf780af-9891-4dcc-b4ed-1202216f59fa

We could minify the need of storage only to subscriptions, which we need to store somewhere with connection info. Events don't need to be processed in another lambda instance but can be fired from current lambda that is publishing them. So some sort of Memory PubSub could be implemented.

For example it could be a hybrid of PubSub and event processor:

https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/packages/aws-lambda-graphql/src/PubSub.ts#L47 there could be a code to fetch all registered subscriptions so basically you just need to move this code https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/packages/aws-lambda-graphql/src/createDynamoDBEventProcessor.ts#L74 to publish method.

So implementation can be rather simplified :)

stephenhandley commented 5 years ago

That's awesome to learn about WebSocket support on gateway/lambda, sounds like a great option.

In my case, I'm running apollo-server on Google Cloud Functions, and so I'm looking for a more general solution to handling subscriptions that isn't coupled to a specific provider.

I'd like to be able to continue running queries and mutations via the serverless function (whether that be GCF, AWS Lambda, etc.) and have a long-running server (i.e. AppEngine) handle the web socket connections and function as a proxy for the subscription operations, and then use split from apollo-link in my client to route subscriptions there and queries/mutations to the serverless function. Communication between the two would use some pubsub service (i.e Google CloudPubSub) where the subscription server would subscribe to relevant topics, and be notified via publish events triggered after mutations on the serverless function.

It sounds like this general approach is discussed in the spec:

In large scale subscription systems, the Subscribe() and ExecuteSubscriptionEvent() algorithms may be run on separate services to maintain predictable scaling properties. See the section below on Supporting Subscriptions at Scale.

...

Supporting subscriptions is a significant change for any GraphQL service. Query and mutation operations are stateless, allowing scaling via cloning of GraphQL server instances. Subscriptions, by contrast, are stateful and require maintaining the GraphQL document, variables, and other context over the lifetime of the subscription.

Consider the behavior of your system when state is lost due to the failure of a single machine in a service. Durability and availability may be improved by having separate dedicated services for managing subscription state and client connectivity.

I'm wondering how involved the subscription server needs to be, specifically, whether it needs full knowledge of the schema or if it could just handle the client connection state and proxy operations to the serverless function. If not, I'm assuming I'm essentially going to have to run the full GraphQL server on AppEngine as well and just ensure operation splitting happens in our clients.

If anyone has done anything along these lines, I'd appreciate any help/suggestions!

michalkvasnicak commented 5 years ago

@stephenhandley Yes it needs to be aware of schema because on each event you are running the operation that has been used to subscribe against schema.

So basically it works like you have subscription XYZ {} stored somewhere on a server and then on received event from PubSub you are running it against schema. https://github.com/michalkvasnicak/aws-lambda-graphql/blob/master/packages/aws-lambda-graphql/src/createMemoryEventProcessor.ts#L71

I have a few interfaces there that you could use to implement subscriptions for google cloud functions but you will need to somehow create a layer that will emit events for Websocket connection in similar manner as AWS API Gateway v2 is doing.

--

I'd like to try implement subscriptions in ApolloServer by implementing subscription support to apollo-server-lambda. Maybe if it wasn't coupled to subscriptions-transport-ws it could be easier to support different connection/subscription schemas.

stephenhandley commented 5 years ago

@michalkvasnicak awesome thank you, have been looking at your repo and its a great jumping off point. will let you know once i make some progress.

nateq314 commented 5 years ago

@stephenhandley how did it go? I'm looking to do the same thing with a repo I've been working on, and running into the same issue. It's Apollo Server deployed on GCF. Was thinking along the same lines as you, that lambdas are a perfect fit for traditional request-response APIs, but it seems like they don't make much sense for subscriptions over websockets because the connections are stateful and potentially long-lived. How do you prevent the function (lambda) from shutting down prematurely, cutting off the connection? You would have to make it self-aware in that regard, only shutting down when it's sure there are no subscriptions. Right? Does that make sense? If so, how would I even start to go about that? Is this a problem that's already been figured out and a solution exists somewhere?

Otherwise, would be great to split the two services up as you suggest, with the traditional API served over the lambda and subscriptions on a long-lived server, with the two communicating somehow. Was wondering if you or anyone else had made any progress there.

stephenhandley commented 5 years ago

@nateq314 sorry for the delay responding. I actually ended up punting on this because we were only needing subscriptions for a small section of our app, and so we just used an off the shelf push service to handle that outside graphql as a stopgap solution.

That said, I'm still interested in figuring this out though, just haven't had the bandwidth. I don't think what you've outlined in the first paragraph would be viable from a cost perspective. at that point, it might make more sense to just run the whole thing on appengine or some other long lived runtime.

I'll check back in once I've had a chance to properly dig into this. Another thing I've been meaning to do which you might be interested in looking into how AWS App Sync appears to handle a similar process using subscriptions along with serverless deployment of GraphQL.
https://docs.aws.amazon.com/appsync/latest/devguide/system-overview-and-architecture.html

jkarneges commented 5 years ago

Ta-da https://github.com/fanout/apollo-serverless-demo

superandrew213 commented 5 years ago

@nateq314 connections are not managed by Lambda but by API Gateway.

hakimio commented 5 years ago

@jkarneges Why not use API Gateway Websockets (tutorial) instead of Fanout?

jkarneges commented 5 years ago

Hi @hakimio, the main benefit of Fanout is that it's pub/sub-based, so you can send data to multiple WebSocket clients with a single publish API call. For example, if you wanted to broadcast a sports score to thousands of people, Fanout is more optimized for that.

Fanout could also be useful for anyone not using AWS, for example people deploying to something like Zeit instead of Lambda. Our implementation currently relies on DynamoDB for state but that could be substituted.

Finally, it's worth noting that Fanout's proxy is open source (https://github.com/fanout/pushpin). So you can run it locally, or with OpenFaaS, etc.

hakimio commented 5 years ago

@michalkvasnicak made a similar example using AWS API Gateway v2 Websockets, lambda and DynamoDB -> aws-lambda-graphql. If you are already using so many Amazon services, why not go all the way.

jkarneges commented 5 years ago

Yup if you're already on AWS then using API Gateway WebSockets is certainly a fine way to go. Just mentioning another option.

bublig737 commented 5 years ago

@hakimio aws-lambda-graphql requires a not-standart apollo client, that is why aws-lambda-graphql match only for javascript clients

AlpacaGoesCrazy commented 5 years ago

Here is my example implementation of AWS API Gateway Websockets subsriptions https://github.com/AlpacaGoesCrazy/serverless-graphql-subscriptions It uses regular apollo-client

michalkvasnicak commented 5 years ago

From what I see I can remove my own websocket implementation because back in the day when I was implementing it, the apollo subscription link could not communicate with AWS API Gateway because the websocket protocol was not supported. Now it can be changed, so problably I should investigate that :)

AlpacaGoesCrazy commented 5 years ago

@michalkvasnicak Hey, I tried updating WS protocol in your library so it can work with apollo-link-ws and subscriptions-transport-ws. Seems to work just fine. https://github.com/AlpacaGoesCrazy/aws-lambda-graphql/tree/change-to-apollo-client Maybe you can tell what else should be done with this solution to be integrated in your library?

Maxwell2022 commented 5 years ago

does it have to be tied to API Gateway? I guess some of us use an Application Load Balancer

hakimio commented 5 years ago

@Maxwell2022 API Gateway V2 in this case is only needed to provide WebSocket server, not to handle http requests. You can use whatever you want for http.

Maxwell2022 commented 5 years ago

What I was trying to say is that ALB is also supporting websocket connections, so do we need to add a dependency on the API Gateway to support websockets? I'm guessing there is a cost involved with websocket connection on API Gateway?

MathiasKoch commented 5 years ago

I do agree with @Maxwell2022. Websocket support through either API GW and ALB would be awesome! We are running systems on both, and it would be nice to have options, even in a serverless environment

danprince commented 5 years ago

@Maxwell2022 @smilykoch I'm pretty sure ALB websockets won't work with lambda because the instances are not long running and ALB doesn't "manage" the connection between lambda restarts.

API gateway assigns an id to each connection and keeps the socket open, then lambdas post messages to that socket via API gateway, using the connection id (rather than having direct write access).

mskg commented 5 years ago

@Maxwell2022 @smilykoch I'm pretty sure ALB websockets won't work with lambda because the instances are not long running and ALB doesn't "manage" the connection between lambda restarts.

Correct. ALB can call Lambda, but not for WebSocket connections. That does not make really sense form a short-lived-process conceptual point of view.

MathiasKoch commented 5 years ago

Ahh, alright. I was under the impression that the ALB implementation for websockets was the same as the one in API GW. I agree that it doesn't make sense if it cannot keep state.

fullStackDataSolutions commented 5 years ago

Hey all, I'm starting to implement Subscriptions for an ECS container running behind an API Gateway. I'm using Cognito for Auth, GraphQL Yoga, TypeScript, Nexus and Prisma.

I'm thinking that the best solution is to store the connection data in a dynamo db table. And then use that to track user state. But I'm not quite sure how to tie this all together with the above tools.

hakimio commented 5 years ago

@blazestudios23 There are quite a few projects on github you can take a look:

Anyway, I would suggest starting small and only when you have something working, to try to integrate all the frameworks and libraries you can think of.

arizonatribe commented 5 years ago

The issue I see with many of these solutions (ie, the AlpacaGoesCrazy example that has been linked on this thread many times) is how do you make it so the subscriber can still select certain fields in the subscription response. GraphQL has the benefit of preventing over-fetching, by forcing the caller to select only the fields they want back. With these kinds of lamba-based solutions the original GraphQL lambda shuts down as soon as a query or mutation has finished, so obviously we have to trigger some kind of microservice event in the mutation resolver to trigger something else downstream to handle the logic of deciding who and what to send to any eagerly awaiting subscribers. Something like SNS is probably insufficient because it caps off the message size kind of low, so you can't necessarily send a huge payload from the originating mutation resolver to a downstream lambda.

The downstream lambda would be responsible for checking the dynamodb cache (or whatever persistence mechanism you want to use) and pushing the payload through API Gateway websocket connections. That sounds simple enough, but how does this kind of non-GraphQL lambda make sure the payload is fulfilled in a very "GraphQL manner"?

What I mean is subscribers usually specify a selection set of fields in the response of the subscription that they want back; preventing over-fetching is a big reason for using GraphQL. Without a way for the downstream lambda to apply the user's original subscription query's selection set of fields they want back, it's going to send the whole payload that the mutation forwarded on to the last lambda in the chain. So that lambda would need to be able to apply a GraphQL query to the payload that corresponds to the subscriber's original request.

To make that work you might have to have the downstream lambda get a hold (somehow) of the original selection set, manually parse the AST and execute it on the full payload that it responded to, and send to the end user only the fields from that payload they selected in their original subscription query.

I think it speaks volumes that after almost 10 months this issue is still in open discussion and we're not seeing anyone from the Apollo group weighing in on these suggestions so far. In my opinion it's a bit irresponsible to publish something like apollo-server-lambda - knowing it's going to encourage AWS devs to go in the serverless direction they already love - but not have any solution, suggestion or acknowledgement of handling the major missing feature: subscriptions. I think that package needs to acknowledge directly on the README and/or the API docs that there isn't currently a solution for subscriptions or a recommended way to do it. That way it doesn't waste precious time for devs that need to get a fully featured GraphQL server working.

In my opinion, serverless causes more trouble than it's worth for subscriptions. apollo-server-lambda works great for queries/mutations, but once you introduce subscriptions into the mix you have to ask yourself what real scaling benefit do you even get that you haven't nullified through all the other pieces you've had to take on just to make it work?

jkarneges commented 5 years ago

To make that work you might have to have the downstream lambda get a hold (somehow) of the original selection set, manually parse the AST and execute it on the full payload that it responded to, and send to the end user only the fields from that payload they selected in their original subscription query.

This is sort of what the earlier-mentioned fanout-graphql-tools does, though it's a work in progress.

after almost 10 months this issue is still in open discussion and we're not seeing anyone from the Apollo group weighing in

I reached out to @gschmidt a few months ago and he didn't consider this issue to be high priority. But maybe that could change.

davidalekna commented 5 years ago

Why not AppSync? It has Apollo Server base

khaledosman commented 5 years ago

Why not AppSync? It has Apollo Server base

I personally don't like AppSync for the following reasons:

  1. it replaces your entire backend and creates everything for you, which works fine if you just want to build another CRUD app, but if you want to have custom resolvers fetching data from external data sources then it gets annoying and you have to configure custom lambdas for each resolver. Plus I want to build my own custom backend, I couldve just used graphcool or graphCMS instead if I didnt want to build a backend.
  2. No JS resolvers, but instead it uses ugly / weird template mappings
  3. it forces you to use AWS services like Cognito for authentication and dynamoDB as a db which I'm not a fan of due to its terrible syntax, so I prefer to use a MongoDB Atlas database instead. Don't want to use dynamoDB with appSync? you have to create other lambdas to pipe responses to your external database.
  4. its more difficult to debug and you lose good monitoring solutions like Apollo Engine.
  5. it forces you to learn the AWS way of doing things instead of normal graphql setup and creates a vendor lock-in.
  6. AWS services come with hidden prices.
  7. all the extra round trips between AWS services, authentication lambda, custom resolvers lambdas, piping to external database lambdas, cognito, appsync, database, api gateway websockets, listening to dynamodb updates lambdas to fetch connected users and send realtime updates, etc.. just create unnecessary added latency and complexity.
  8. how do you setup shared caching across your lambdas/apollo-servers like a redis or memcached? AWS offers Elasticache which can only be deployed inside a VPC, so it requires you to put your lambdas inside a VPC. which is really bad for performance because VPCs increase the coldstart time of your lambdas dramatically by 10+ seconds.
  9. AppSync subscriptions use API Gateway sockets which also has its own limitations like: a. Maximum 2 hours connection duration then timeout b. custom @aws_subscribe directive in your graphql schema <-- vendor lockin c. 10 minutes idle connection timeout d. There’s no way to broadcast messages to all connected clients with one API call. you need to make an API call to AWS for each connection you want to send a message to. Publishing a single update to 1m clients requires fetching all 1m connection IDs out of the database and making 1m API calls which is slow and unscalable, unless you use SQS and lambdas to defer the logic, which adds even more latency due to the extra roundtrips. e. Connection metadata needs to be stored in a database, This means that for every connection and disconnection, a Lambda needs to be run. to store information about who connects/disconnects in a database which makes it stateful, adds extra roundtrips/latency and makes it less scalable for a big number of users as it can easily hit lambda/API gateway execution limits.

I believe the only way to do a proper scalable graphql server setup in AWS is to create your own websocket server via an ECS or use PubNub as a replacement for subscriptions, use an external redis cluster like redislabs or also create your own via ECS for caching, and you can use lambdas for the apollo-server setup with a database of your choice. and manually connect/publish to your websocket server for subscriptions. The AWS solutions are not really well thought through IMO.

davidalekna commented 5 years ago

@khaledosman that's a great response mate! I was planning to get using AppSync because of subscriptions as I'm unable to do them on my lambdas at the moment but seems like it might not be the way forward 😅

hakimio commented 5 years ago

People looking for managed graphql service might want to try 8base (8base.com). While it's not the fastest API, imho it's still better in many aspects than AppSync.

MathiasKoch commented 5 years ago

I fully agree with @khaledosman in all his concerns around AppSync. At my workplace we are running a fully serverless web application utilizing as much of AWS's infrastructure as possible (so we don't really care about vendor lock-in, as we are very much locked in already), but the way AppSync is setup, made us go with a serverless Apollo server through Lambda, although it has a lot of downsides as well (ranging from cold-starts to missing subscriptions), it still gives us a much better foundation for doing custom logic in the gateway, custom server directives and schema stitching (now federation).

We would love a managed gateway though! Just not in the way AppSync works. It's too much of a tradeoff.

We would love for a great of implementing subscription support through Api GW though. Seems like we should have a look at @AlpacaGoesCrazy implementation, as we already use DynamoDB for everything, and use streams extensively.

davidalekna commented 5 years ago

I've just realised you don't have JS in the resolvers of AppSync, just some ugly mapping templates 😬 this issue has now been open for almost a year. Is there a way to contribute to speed things up?

AlpacaGoesCrazy commented 5 years ago

@smilykoch You can already try this in aws-lambda-graphql library as it is now compatible with standard apollo clients

MathiasKoch commented 5 years ago

@AlpacaGoesCrazy Yep, i already saw. I just need to get some more urgent tasks of my back, to make room for playing with potential techs. ;) Thank you.

ptimson commented 5 years ago

Hey there, this thread has been great and I've used @AlpacaGoesCrazy's example as a base.

However I'm really struggling trying to build in Authentication. I can't seem to get anything sent down in the first $connect request. I have tried to set headers and payload using connectionParams and a link middleware but they aren't sent down on the $connect request only on the second request connect_init. The only way I can think to do it is to add the token to the request URL.

Has anyone managed to do it / got a working example?

hakimio commented 5 years ago

@ptimson you can't have the token in headers, it has to be sent as a query parameter.

EDIT: or maybe you can with authorizer function. Take a look at serverless docs.

ptimson commented 5 years ago

@hakimio thanks for the super speedy response! Good to know as spent all night on this! How would you go about updating the client's url to set the query parameter from a react component?

EDIT: The same issue is getting it into the auth function though from the SubscriptionClient!

davegariepy commented 4 years ago

Hi, just curious of authentication considerations. Do any of the examples allow authenticating with aws_iam / identity pool using Sigv4 signed request?

praisegeek commented 4 years ago

Hi, just curious of authentication considerations. Do any of the examples allow authenticating with aws_iam / identity pool using Sigv4 signed request?

The way I got away with Auth is by using custom lambda authorizers through the underlying API gateway.

hemal-digiflux commented 4 years ago

Hey all, I'm starting to implement Subscriptions for an ECS container running behind an API Gateway. I'm using Cognito for Auth, GraphQL Yoga, TypeScript, Nexus and Prisma.

I'm thinking that the best solution is to store the connection data in a dynamo db table. And then use that to track user state. But I'm not quite sure how to tie this all together with the above tools.

Did you get successful in using AppSync along with Nexus with timely easy deployment strategy?

MentalGear commented 3 years ago

Looking for a solution to this that can run subs on a serverless instance, but is independent toward what client is used. .

pescoboza commented 3 years ago

Is there a way to implement suscriptions on Google Cloud Functions? How could I get this working with Firestore events?

jthegedus commented 3 years ago

Is there a way to implement suscriptions on Google Cloud Functions? How could I get this working with Firestore events?

@pescoboza You can probably get subscriptions working with Cloud Run as it supports websockets. GraphQL subscriptions are stateful across server instances, as messages need to propogate between running server instances. Firestore doesn't really fit into this model, but someone has built the GCP PubSub integration - https://github.com/axelspringer/graphql-google-pubsub

See this very old thread on the topic - https://github.com/apollographql/graphql-subscriptions/issues/53

boredland commented 3 years ago

Is there a way to implement suscriptions on Google Cloud Functions? How could I get this working with Firestore events?

@pescoboza You can probably get subscriptions working with Cloud Run as it supports websockets. GraphQL subscriptions are stateful across server instances, as messages need to propogate between running server instances. Firestore doesn't really fit into this model, but someone has built the GCP PubSub integration - https://github.com/axelspringer/graphql-google-pubsub

See this very old thread on the topic - https://github.com/apollographql/graphql-subscriptions/issues/53

Yeah, using graphql-google-pubsub works fine on cloud run!

glasser commented 3 years ago

I'm going to close this issue because for the time being subscriptions is not a supported part of Apollo Server (though we hope to reintegrate subscriptions later). Feel free to continue to discuss the approaches you are trying, though!