fastify / fastify-websocket

basic websocket support for fastify
MIT License
393 stars 72 forks source link

Node server crashes with ERR_INTERNAL_ASSERTION error #249

Closed jsmircic closed 1 year ago

jsmircic commented 1 year ago

Prerequisites

Fastify version

4.x

Plugin version

7.1.3

Node.js version

14.x

Operating system

Linux

Operating system version (i.e. 20.04, 11.3, 10)

Azure AppServices for linux (Node running inside docker container)

Description

We are getting some intermittent container crashes and looking at the logs looks like source of the exception is @fastify/websockets (logs below). We are using @fastify in combination with TRPC subscriptions for real-time communication.

Container exited unexpectedly: last 10 seconds logs [2023-04-17T06:26:19.633055587Z Error [ERR_INTERNAL_ASSERTION]: This is caused by either a bug in Node.js or incorrect usage of Node.js internals.
2023-04-17T06:26:19.633108688Z Please open an issue with this stack trace at https://github.com/nodejs/node/issues
2023-04-17T06:26:19.633116188Z 
2023-04-17T06:26:19.633120988Z     at new NodeError (internal/errors.js:322:7)
2023-04-17T06:26:19.633126088Z     at assert (internal/assert.js:14:11)
2023-04-17T06:26:19.633131288Z     at ServerResponse.assignSocket (_http_server.js:225:3)
2023-04-17T06:26:19.633136088Z     at Server. (/usr/src/app/server/dist/node_modules/@fastify/websocket/index.js:52:19)
2023-04-17T06:26:19.633141288Z     at Server.emit (events.js:400:28)
2023-04-17T06:26:19.633145788Z     at Server.emit (domain.js:475:12)
2023-04-17T06:26:19.633149988Z     at onParserExecuteCommon (_http_server.js:703:14)
2023-04-17T06:26:19.633154388Z     at onParserExecute (_http_server.js:616:3) {
2023-04-17T06:26:19.633159188Z   code: 'ERR_INTERNAL_ASSERTION'
2023-04-17T06:26:19.633164488Z }
]

I believe server should not completely crash and restart on such websocket errors. Any help or guidelines on we can resolve this problem. Not sure if this is the right place to raise the issue as the problem could be in the fastify plugin for TRPC? But from the error logs we can only pinpoint to @fastify/websocket.

Thanks

Steps to Reproduce

We are unable to reproduce this issue in our development or staging environments, it is happening only in production.

Expected Behavior

The server should not crash with an ERR_INTERNAL_ASSERTION error.

climba03003 commented 1 year ago

It is hard to helps if there is no reproducible code or the problem is cloud platform specific.

jsmircic commented 1 year ago

I understand. Hard to provide reproduction as it's only production environment and we are having lots of clients connected. In all of our testing this never happened.

Looking at stack trace, this line of Node httpserver is failing

https://github.com/nodejs/node/blob/edd64fe5ba20c6c2f53720c672c591f09d9d4ac0/lib/_http_server.js#L225

Don't really have enough knowledge to make sense of what is all happening here. If you have any idea on how we could further investigate please let me know. Thanks

climba03003 commented 1 year ago

Will it be the environment itself replaced the http.IncommingMessage to others? For example, it is easily to happen in server-less environment.

jsmircic commented 1 year ago

I doubt it, as the Node is running inside custom built docker container. And also everything is working ok 99% of the time, sockets are connected and messages are delivered. Just sometimes this error occurs and server restarts. Had 3 crashes logged yesterday. Not a huge problem as server restarts fast, but some users do get some failed requests at that point. For that reason would be great to find a solution.

climba03003 commented 1 year ago

I am think if ._httpMessage is null, it basically means that the request is detached from the socket. By that meaning, some client aborted the request while we are assigning a empty socket to response.

jsmircic commented 1 year ago

As I understand socket._httpMessage should be undefined or null in a call to rawResponse.assingSocket(socket). For some reason in our case only sometimes it's not. Maybe the problem is not here but in TRPC fastify integration. But still as this can happen, i'm thinking such case should be handled also here and not end up in crashing the process? Would like to be of more help but my knowledge on websockets internals is limited :(

mcollina commented 1 year ago

~I think the problem here is that the socket is destroyed before completing the handshake, so there is that error.~

I've seen this error myself a few times.

(Note that this is mostly a bug in Node.js core that we will work around here).

mcollina commented 1 year ago

So, the problem is not related to socket being destroyed, but to sockets calling assignSocket twice.

mcollina commented 1 year ago

@jsmircic could you try and verify if https://github.com/fastify/fastify-websocket/pull/253 solves it? It fixes my case, but I'm unsure about yours.

I would need a repro.

jsmircic commented 1 year ago

Tnx, sure, I'll report back with findings. But it's hard for us to create a repro. Happens only when larger number of clients is connected. We couldn't detect any pattern, occurs on random requests. We did some changes on the TRPc side that might have helped. Didn't have any crashes in last 2 days.

jsmircic commented 1 year ago

On our side looks like the issue is resolved. Thank you.