RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.63k stars 10.63k forks source link

6.1.6: Assertion `task && "When an ares socket is closed we should have a handle for it"' failed. #29226

Open m4z opened 1 year ago

m4z commented 1 year ago

Description:

Our server has ~450 users, but usuallly less than 5% active at any given time. During a time of very low usage (1-2 online users), Rocket.Chat crashed for no apparent reason.

Steps to reproduce:

  1. (Unclear)
  2. Server dies with the log entries below (in the "Relevant logs" section of this issue)
  3. We had to manually restart R.C via systemctl restart rocketchat.service

Expected behavior:

Don't crash. :wink:

Actual behavior:

systemd[1]: rocketchat.service: Main process exited, code=killed, status=6/ABRT
systemd[1]: rocketchat.service: Failed with result 'signal'.

Server Setup Information:

Client Setup Information

Additional context

Similar to #22841

Relevant logs:

Server logs:

May 12 07:02:00 myserver.example rocketchat[1737]: {"level":30,"time":"2023-05-12T07:02:00.449Z","pid":1737,"hostname":"myserver","name":"SyncedCron","msg":"Starting \"Generate download files for user data\"."}                                                                                                                                                                  
May 12 07:02:00 myserver.example rocketchat[1737]: {"level":30,"time":"2023-05-12T07:02:00.450Z","pid":1737,"hostname":"myserver","name":"SyncedCron","msg":"Finished \"Generate download files for user data\"."}
May 12 07:02:31 myserver.example rocketchat[1737]: /usr/local/n/versions/node/14.19.3/bin/node[1737]: ../src/cares_wrap.cc:148:void node::cares_wrap::{anonymous}::ares_sockstate_cb(void*, ares_socket_t, int, int): Assertion `task && "When an ares socket is closed we should have a handle for it"' failed.
May 12 07:02:31 myserver.example rocketchat[1737]:  1: 0xa3aaf0 node::Abort() [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]:  2: 0xa3ab6e  [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]:  3: 0x9ac35a  [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]:  4: 0x18bd668 ares__close_sockets [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]:  5: 0x18c42ab  [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]:  6: 0x18c5e28  [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]:  7: 0x13bcc65  [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]:  8: 0x13c1315 uv_run [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]:  9: 0xa7b642 node::NodeMainInstance::Run() [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]: 10: 0xa03805 node::Start(int, char**) [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example rocketchat[1737]: 11: 0x7f93e73b7d0a __libc_start_main [/lib/x86_64-linux-gnu/libc.so.6]
May 12 07:02:31 myserver.example rocketchat[1737]: 12: 0x98c58c  [/usr/local/n/versions/node/14.19.3/bin/node]
May 12 07:02:31 myserver.example traefik[619]: time="2023-05-12T07:02:31Z" level=error msg="vulcand/oxy/forward/websocket: Error when copying from backend to client: websocket: close 1006 (abnormal closure): unexpected EOF"
--
May 12 07:02:31 myserver.example traefik[619]: time="2023-05-12T07:02:31Z" level=error msg="vulcand/oxy/forward/websocket: Error when copying from backend to client: websocket: close 1006 (abnormal closure): unexpected EOF"
May 12 07:02:31 myserver.example systemd[1]: rocketchat.service: Main process exited, code=killed, status=6/ABRT
May 12 07:02:31 myserver.example traefik[619]: time="2023-05-12T07:02:31Z" level=error msg="vulcand/oxy/forward/websocket: Error when copying from backend to client: websocket: close 1006 (abnormal closure): unexpected EOF"
May 12 07:02:31 myserver.example systemd[1]: rocketchat.service: Failed with result 'signal'.
May 12 07:02:31 myserver.example systemd[1]: rocketchat.service: Consumed 8min 47.597s CPU time.

No browser logs.

m4z commented 1 year ago

I'm not sure why the logs mention node 14.19.3 (the OS version) instead of the packaged version. Given that I had to manually run systemctl daemon-reload after every other R.C update, I suspect rocketchatctl doesn't do that, and that we still ran with the old unit file.