enisdenjo / graphql-ws

Coherent, zero-dependency, lazy, simple, GraphQL over WebSocket Protocol compliant server and client.
https://the-guild.dev/graphql/ws
MIT License
1.75k stars 162 forks source link

[ReactNative] Slow refresh subscription causes websocket close with {"isTrusted": false, "message": null} #494

Closed XChikuX closed 1 year ago

XChikuX commented 1 year ago

Screenshot Visualising is always helpful.

Working subscription. This one doesn't authenticate in the backend and seems to work fine. image

Wheras one that does authenticate:

image

As you can see causes ERROR {"isTrusted": false, "message": null} issue. However, the initial graphql subsciption was successful. Giving me back Authenticated data! It's only the new ones that fail.

Expected Behaviour I expected it to work in react-native. Since my code seems to work quite well in a regular environment,

Actual Behaviour but instead it did ERROR {"isTrusted": false, "message": null}

Debug Information I don't see any error logs on my backend. Everything seems A OK back there. The only other difference between the two subscriptions in the backend are that the "Hello" one responds every second. While the heartBeat takes 10 seconds.

Further Information I recreated my scenario on a Sandbox that supports react-native. Of course this isn't the exact environmment. But it works great!

I use expo to compile to Android. Not doing anything too fancy outside the codesandbox example.

XChikuX commented 1 year ago

When I host my subscription locally and hit it with ws:// everything seems to work fine.

I re-tried using the apollo-client as well, with no luck.

I've boiled this down to some issue while using wss://

I have 2 subscriptions. One works every second to return Hello. While the other works to refresh user information every 10 seonds.

As you can see below, after the initial the connection (Which means, it indeed does work end-2-end) it errors out, closing the socket.

LOG {"heartBeat": {"typename": "User", "age": 27, "auth": null, "bio": null, "dob": "1995-10-17T00:00:00+00:00", "email": "89ou1oqzsr@example.com", "fname": "Test1", "gender": "male", "height": 160, "images": null, "interests": null, "lname": "23", "location": {"typename": "Coordinates", "latitude": 47, "longitude": -56, "timestamp": [Object]}, "mname": null,"phone": {"__typename": "Phone", "countryCode": "1", "number": 9465768976}}} LOG {"check": "Hello"} LOG {"check": "Hello"} LOG {"check": "Hello"} LOG {"check": "Hello"} ERROR [Error: Socket closed]

Please help, this happens only in a react-native environment. "graphql": "16.6.0", "graphql-request": "6.1.0", "graphql-ws": "5.14.0", "react": "^18.2.0", "react-dom": "^18.2.0", "react-native": "0.72.0", "expo": "^48.0.19",

@enisdenjo Would love it if you could pitch in.

XChikuX commented 1 year ago

Ok, I figured it out....

The socket auto closes in a react-native environment when the subscription only updates after a certain number of seconds.

My initial subscription had a 10 second timer. I changed it to 2.5 second refresh and everything works fine.

Unless this is intentional (which I don't think it is), it is most definitely a bug.

Probably some timeout value is misplaced in the code..

enisdenjo commented 1 year ago

When I host my subscription locally and hit it with ws:// everything seems to work fine.

wss (with an extra "s") is WebSocket over TLS, like HTTPS. If you're running the server locally, you most probably dont have TLS set up - hence the fail. wss is for production environments where the WebSocket is served behind a TLS enabled server.

If the server indeed has TLS enabled, it could then be 2 things: server is misconfigured, or a react-native issue. (Because WS over TLS is a native feature of the browser/environment - there's nothing graphql-ws can change to fix the native behaviour).

My initial subscription had a 10 second timer. I changed it to 2.5 second refresh and everything works fine. Probably some timeout value is misplaced in the code..

The only built-in keep-alive in code is the 12 second WebSocket ping on the server. There's no other default keep-alive mechanism, you're responsible for adapting the keep-alive to your requirements.

On this note, maybe we can reduce the default 12 seconds (to like 6 seconds). Would you be so kind @XChikuX to try that and report whether it fixes the issue? Thanks in advance.

import { WebSocketServer } from 'ws'; // yarn add ws
import { useServer } from 'graphql-ws/lib/use/ws';
import { schema } from './my-graphql-schema';

const server = new WebSocketServer({
  port: 4000,
  path: '/graphql',
});

useServer(
  { schema },
  server,
+ 6_000
);

console.log('Listening to port 4000');

Oh and, you're also more than welcome to open a PR here if the fix works - in case you want to contribute. šŸ˜„

XChikuX commented 1 year ago

" you most probably dont have TLS set up" <-- I actually do.

The wss:// works fine when I try it from here

In an original react-native environment however, the backend refresh rate when supplied anything over 10 seconds seems to fail.

It's at 7.5 seconds at the moment.

enisdenjo commented 1 year ago

The wss:// works fine when I try it from here

Then it is a react-native issue (I've edited my original comment).

In an original react-native environment however, the backend refresh rate when supplied anything over 10 seconds seems to fail.

So changing the default keep-alive on the server to 6 seconds worked?

XChikuX commented 1 year ago

@enisdenjo Yes!

Edit: My server runs strawberry-graphql using python. I edited it there

enisdenjo commented 1 year ago

Cool!

Do you want to open a PR, or do you want me to? I don't mind either way - was just wondering if you want to push a fix since you've discovered it. šŸ˜„

XChikuX commented 1 year ago

I'm not sure how this keepAlive works.

Considering that my issue was fixed after making the server refresh rate faster on the backend; I'm not sure decreasing the keepAlive is the way to go.

I'll experiment a bit and get back to you.

XChikuX commented 1 year ago

Can I edit the keepAlive time in this syntax?

  client.subscribe(
    {
      query: AUTH_SUBSCRIPTION,
      variables: { pk: user_id }
    },
    {
      next(data) {
        console.log(data.data.heartBeat);
      },
      error(err) {
        console.error(err);
      },
      complete() {
        console.log("Subscription completed");
      }
    },
    6_000
  );
}
enisdenjo commented 1 year ago

Considering that my issue was fixed after making the server refresh rate faster on the backend; I'm not sure decreasing the keepAlive is the way to go.

Decreasing the keep-alive means faster issuing of pings from the server.

Can I edit the keepAlive time in this syntax?

That's the client, I was talking about the server. If you want to add keep-alive to the client, you should check the "client usage with ping/pong timeout and latency metrics" recipe.

If you don't care about what happens when the server does not respond to a ping (or latency metrics), you can simply set the keepAlive option on the client like this:

import { createClient } from 'graphql-ws';

const client = createClient({
  url: "wss://thedating.club/graphql",
  connectionParams: async () => ({
    authorization: token
  }),
+ keepAlive: 6_000,
});

Hope this helps!

XChikuX commented 1 year ago

@enisdenjo Modifying the keepAlive on the client allows me to keep arbitrary timings on the server without a socket timeout on the client side. This is the way it should work.

I believe we should work with strong defaults. Now that we know react-native (or the underlying android environment) websockets are more time sensitive. Let's modify the default keepAlive on the client side as well as the server? Unless there is some engineering reason to not set a default keepAlive on the client side ie.

If you agree.. What files do I modify for the client side to be set? I'll push a PR once I have this information.

enisdenjo commented 1 year ago

I believe we should work with strong defaults. Now that we know react-native (or the underlying android environment) websockets are more time sensitive. Let's modify the default keepAlive on the client side as well as the server? Unless there is some engineering reason to not set a default keepAlive on the client side ie.

This can be stated in the documentation, but I still wouldn't change the client defaults.

The reason is that the server is already performing the keep-alive and it should always be the server doing so. There was a long discussion at #117 where I was even reluctant to add the client-side pings. You can read my comments there, you really don't need client side pings for most of the real-life scenarios.

Moreover, I've run your CodeSandbox with the graphql-ws/lib/use/ws server from this library - without any client-side keep-alives - the WebSocket connection stays alive idling for hours. So this is definitely some issue with strawberry-graphql itself. I'd suggest opening an issue there to get to the bottom of this.

XChikuX commented 1 year ago

The codesandbox is not an accurate representation of a react-native environment running on a real android/ios device @enisdenjo

You simply can't use that as an example to justify correct working. The bug on my end still remains that the websocket on the client side (react-native on a real android device) closes without a keepAlive below 8 seconds by default. I completey agree with you here "you really don't need client side pings for most of the real-life scenarios."

If the fix has to be on the server side I understand, like what I suggested--That is, an engineering reason for not doing it on the client.

I'll ping the strawberry folks asking for a resolution. But before I do..

Could you please elaborate why you think "WebSocket connection stays alive idling for hours" is a bug? My Subscriptions are meant to stay alive until the client is closed. I might have a gap in my fundamental understanding of subscriptions.

enisdenjo commented 1 year ago

The codesandbox is not an accurate representation of a react-native environment running on a real android/ios device

Sure, but your example still didn't work with strawberry-graphql - and it does with the graphql-ws server. So that should mean something.

The bug on my end still remains that the websocket on the client side (react-native on a real android device) closes without a keepAlive below 8 seconds by default.

What I am saying is that the server is not configured properly for WebSockets - it does not perform keep-alives and most likely itself terminates connections (not the client). And a fix for you client is to simply add one line: keepAlive: 6000.

Please understand, I am not against fixes - bugs are of the highest priority here! What I am trying to say is that I am not convinced we should change the sane defaults following the aforementioned arguments.

Could you please elaborate why you think "WebSocket connection stays alive idling for hours" is a bug?

It is not a bug. It's the expected and correct behaviour. The WebSocket connection stays alive for hours without any message activity because the graphql-ws server implements the WebSocket level keep-alive mechanism.

https://github.com/enisdenjo/graphql-ws/blob/04c92b18db4aecfb75dd5c4fc7390382bd7f8135/src/use/ws.ts#L107-L130

My Subscriptions are meant to stay alive until the client is closed. I might have a gap in my fundamental understanding of subscriptions.

You can of course close the connection whenever you want! I was just pointing out that a server implementing the WebSocket level keep-alive mechanism shouldnt have you implement client-side keep-alives.

XChikuX commented 1 year ago

Thank you for your clarifications. I completely agree...

I wasn't aware the subscription websocket spec requires mandatory keepAlives on the server side. I've reached out in the strawberry discord for further action.

enisdenjo commented 1 year ago

You're very welcome! If you have any more questions, I am more than happy to help out.

I wasn't aware the subscription websocket spec requires mandatory keepAlives on the server side.

It's not mandatory per se, its just a conventionalised and spec-level approach to keeping connections alive.

XChikuX commented 1 year ago

I think the docs should make it mandatory, no?

If the client needs to keep the socket connection open for recovery, like you mentioned in #117

In other words.. . . What scenarios would a default mandatory server side keepAlive be a bad idea? (Of course the value should be configurable if needed as well and even allowed to be turned off.)

I'm not able to think of any. Standardizing best practises should include thwarting failure cases for customers/devs who don't want to get too deep into the spec.

enisdenjo commented 1 year ago

I think the docs should make it mandatory, no?

I don't think so. Also, we don't know whats cutting the connection off. Maybe strawberry has something implemented specifically to disconnect after 10 seconds, maybe the underlying Python WS library has a bug, etc.

What scenarios would a default mandatory server side keepAlive be a bad idea? (Of course the value should be configurable if needed as well and even allowed to be turned off.)

No scenario. Having a keep-alive is always good.

Standardizing best practises should include thwarting failure cases for customers/devs who don't want to get too deep into the spec.

The GraphQL over WebSocket spec is a transport spec, it talks exclusively about transporting GraphQL and how that works within the WebSocket. It's not a WebSocket spec. I don't want to convolute it with lower level WS specific things, implementors should have the knowhow about WS to begin with.