Closed planninpoker closed 1 year ago
Hello, does it reproduce with local Centrifugo server? Can you provide steps to reproduce including server configuration? Which transport you are using?
In general this means that server did not receive first frame from client over WebSocket (or other transport) in 10 secs. But not sure which conditions may lead to this in normal situation - so asking questions above.
Hey @FZambia
I'm not able to reproduce it locally, it works for 99% of clients, but I've watched at-least 2 sessions lately where it just won't connect, and these 3502 errors are being thrown.
{
"token_hmac_secret_key": "",
"admin": true,
"admin_password": "",
"admin_secret": "",
"api_key": "",
"allowed_origins": [""],
"http_stream": true,
"sse": true,
"emulation": true,
"presence": true,
"join_leave": true,
"force_push_join_leave": true,
"allow_subscribe_for_client": true,
"allow_presence_for_subscriber": true
}
export const centTransports: TransportEndpoint[] = [
{
transport: "websocket",
endpoint: `wss://${process.env.NEXT_PUBLIC_CENT_URL}/connection/websocket`,
},
{
transport: "http_stream",
endpoint: `https://${process.env.NEXT_PUBLIC_CENT_URL}/connection/http_stream`,
},
{
transport: "sse",
endpoint: `https://${process.env.NEXT_PUBLIC_CENT_URL}/connection/sse`,
},
];
So the server is not responding to the connect? I was wondering if this could happen when websockets were blocked, but I figured one of the fallbacks would take over
I've just found there errors in the logs as well
[
{
level: "error",
error: "unexpected EOF",
time: "2023-10-17T13:44:56Z",
message: "error reading body",
},
{
level: "error",
error: "unexpected EOF",
time: "2023-10-17T13:45:13Z",
message: "error reading body",
},
{
level: "error",
error: "unexpected EOF",
time: "2023-10-18T13:03:23Z",
message: "error reading body",
},
{
level: "error",
error: "unexpected EOF",
time: "2023-10-20T09:29:17Z",
message: "error reading body",
},
{
level: "info",
error: "unexpected end of JSON input",
req: {},
time: "2023-10-20T13:23:18Z",
message: "can't unmarshal emulation request",
},
];
but I've watched at-least 2 sessions lately where it just won't connect
I've just found there errors in the logs as well
Do they correspond to connect issues or it's just all the errors in logs you found? "unexpected EOF" may happen when client goes away before request is read - I suppose Centrifugo should suppress those.
The last error can't unmarshal emulation request
is interesting - need to add more logging for it to understand what was sent. Does it correspond to connection issues or its only one log entry?
Hey @FZambia, I appreciate the help with this.
I see, these were just the only errors that the server had logged, so thought it might be related. I had a hunch it might a corporate firewall or something, as it was likely a team using the site. I've just installed a firewall on my machine with squid, and the switch to ssr is working flawlessly
The clients can never reconnect
A bit misleading... Can't connect at all or can't reconnect? Reconnect is a process after connection loss.
Is there something specifically worth looking for with debugging on?
All debug logs would be helpful, as there is no reproducer - probably they will give insights where to dig further
I've just installed a firewall on my machine with squid, and the switch to ssr is working flawlessly
Does it prevent WebSocket Upgrade requests? Or it allows Upgrade requests but blocks WebSocket frames after connection Upgrade?
Sorry typo. They cannot connect at all. I've tried both, and both work fine on my machine. Blocking websockets, and blocking the initial websocket upgrade request.
I've done some more digging into this one scenerio, and it looks like none of the users from this channel were able to use websockets. 5/8 of them were able to connect with http_stream, and the other 3 never connected successfully. I added tracking on the successful connection types awhile ago, to justify switching from pure websockets to centrifugal.
Got it, well - I suppose this in general simplifies the task, because it would be much harder to find the root cause if it was occasional issues for those clients.
One idea I have at this point is that those users are behind a proxy which allows Upgrade, but blocks WebSocket frames sent after it. In this case we will get transport open event on the frontend - and centrifuge-js won't try to use fallbacks after it because it thinks connection to WebSocket was successful. Though in this case I believe client side timeout (5 sec by default) should fire first and cause disconnect from the client side before server decides to close connection with DisconnectStale reason. 2 questions:
timeout
option or using default?A bit a shot in the dark now, debug logs can tell that I am thinking in the wrong direction.
@planninpoker hello, still waiting for answers above and some logs from those clients, did you have a chance to address this?
Hey @FZambia
I haven't experienced any more issues with other clients since raising this, so I stopped investigating. I'm happy to make this as closed, and assume it had something to do with the users.
Got it, thx. Let's close for now then, need more information to understand what's going on and whether it was some issue with SDK or not. Since you see 3502 on client-side it means WS frames from server to client are working fine, hard to imagine situation when only server to client WS frames are allowed in proxy (but who knows!).
Describe the bug I'm seeing some users failing to connect and the error code is always 3502. I've read the meaning in the documentation
DisconnectStale issued to close connection that did not become authenticated in configured interval after dialing.
but I don't understand what could be causing this. Any help would be amazing.Versions
Additional context I have a react application which sometimes fails to connect. Maybe 5/1000 users fails