andyrichardson / subscriptionless

GraphQL subscriptions (and more) on serverless infrastructure
MIT License
93 stars 3 forks source link

Socket idleness detection #3

Closed andyrichardson closed 3 years ago

andyrichardson commented 3 years ago

About

Ping/Pong

For whatever reason, AWS API Gateway does not support WebSocket protocol level ping/pong. This means early detection of unclean client disconnects is near impossible.

(graphql-ws will not implement subprotocol level ping/pong) (which is understandable).

Socket idleness

API Gateway considers an idle connection to be one where no messages have been sent on the socket for a fixed duration (currently 10 minutes).

Again, the WebSocket spec has support for detecting idle connections (ping/pong) but API Gateway doesn't use it. This means, in the case where both parties are connected, and no message is sent on the socket for the defined duration (direction agnostic), API Gateway will close the socket.

A quick fix for this is to set up immediate reconnection on the client side.

Feature request on AWS DIscussion forums

enisdenjo commented 3 years ago

Hey there! Quick update, I feel like both problems can be solved with https://github.com/enisdenjo/graphql-ws/issues/117#issuecomment-805256188. 😄

andyrichardson commented 3 years ago

Thanks for taking the time to look at this @enisdenjo 🔥

Your suggestion of adding client->server ping/pong on the GraphQL layer is a good shout! Any interval-based messaging should work around the naive idleness detection but, if it needs to be done, that's probably one of the better ways!

I'll keep this open for now as there's still no resolution for server->client ping/pong.

I suspect the solution for this needs to be baked into AWS because, lets say server->client ping/pong does get baked into the graphql-ws subprotocol, we'd still need to:

The problem of server side idleness detection has already been solved in the WebSocket spec through control frames so I'm unsure why AWS's API Gateway implementation is using data frames to detect idleness.

Other than the feature request above, I know @dabit3 has hit up some of the folks working on serverless at AWS - so fingers crossed we can shine some light on this 💡

enisdenjo commented 3 years ago

Heads up https://github.com/enisdenjo/graphql-ws/pull/201. As of graphql-ws@v5.0.0, subprotocol ping and pong messages are supported.

andyrichardson commented 3 years ago

Thanks for the heads up @enisdenjo!

Gutted to see where having to add support for this on the sub-protocol layer (via data frames) when the lower level WS protocol itself (via control frames) is supposed to be responsible for "communicat[ing] state about the WebSocket".

I wonder if there's a way we can provide feedback to teams working on the lower level protocols (or more likely clients) to prevent other sub-protocols from having to do this work too. This works around some issues but I don't think a GraphQL specific sub-protocol should have ever had to consider "transport layer" logic.

From what I can tell, some of the issues that have led to this becoming necessary are:

Anywhoo - thanks so much for adding support for this while we work within the current constraints. I'll look into getting full support for the new protocol additions soon so that this can (hopefully) be something I can actually recommend to folks for production usage.

enisdenjo commented 3 years ago

I doubt we can make a change there. It has been like this since the beginning. All major browsers have these limitations and bug reports open since forever (you can Google to find them), no progress there. The arguments are straight forwards: not deemed necessary, preserve battery, the OS/browser should detect idleness; and most importantly, you can design your keep-alives on the subprotocol level.

Additionally, cited from https://github.com/enisdenjo/graphql-ws/issues/117#issuecomment-856334403:

Seems like that the general tenor of browser implementors is to not touch the WebSocket API anymore and instead push WebTransport forward (which is probably 1-2 years away from broad adoption):

andyrichardson commented 3 years ago

Quick update for anyone keeping tabs on this issue.

I've spent a few hours today working on getting graphql-ws v5 support sorted. The easy stuff such as responding to ping messages is sorted.

After some research, it looks like a state machine is the way to go for scheduling ping dispatch and ping timeout logic. Assuming this isn't a limitation in AWS itself, the main roadblock is a lack of support for step function websocket event invocation in serverless-step-functions. Subscriptionless works just fine without serverless framework, but its the easiest route for most people getting started.