RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.4k stars 10.51k forks source link

RC 4.0.0 node upgrade fails with segfault error #23346

Closed Gummikavalier closed 2 years ago

Gummikavalier commented 3 years ago

Description:

Starting RC node fails with segfault error after upgrading from RC 3.18.1 to RC 4.0.0.

Steps to reproduce:

  1. Stop all your running nodes
  2. Download RC 4.0.0 tar package with curl -L https://releases.rocket.chat/latest/download -o /tmp/rocket.chat.tgz
  3. Check that all dependencies are correct on your server
  4. Install the RC 4.0.0 node binaries with cd /tmp/bundle/programs/server && npm install
  5. Replace your first node with the new RC node
  6. Start the first updated node

Expected behavior:

Node starts and schema upgrade process begins

Actual behavior:

The new node fails to start with a segfault error.

Oct 03 11:00:22 server.example.com kernel: node[18774]: segfault at 23d0 ip 00000000000023d0 sp 00007f8303d8d3c8 error 14 in node[400000+26b5000]
Oct 03 11:00:22 server.example.com abrt-hook-ccpp[18818]: Process 18774 (node) of user 5601 killed by SIGSEGV - dumping core
Oct 03 11:00:25 server.example.com abrt-server[18827]: Executable '/usr/local/bin/node' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Oct 03 11:00:25 server.example.com abrt-server[18827]: 'post-create' on '/var/spool/abrt/ccpp-2021-10-03-11:00:22-18774' exited with 1
Oct 03 11:00:25 server.example.com abrt-server[18827]: Deleting problem directory '/var/spool/abrt/ccpp-2021-10-03-11:00:22-18774'
Oct 03 11:01:01 server.example.com systemd[1]: Created slice User Slice of root.

Server Setup Information:

exequos commented 2 years ago

You can also use node snap if you don't want to install and configure docker on all those machines - sudo snap install node --channel=12/stable

Or, if you're both against snap and docker usage (I've come across such situations), you can download a minimal ubuntu rootfs archive (since we know node + rc is working fine in ubuntu) and run node through chroot - very similar to using docker tbh.

Again - the above are just temporary workarounds.

thanks for suggestion

it seems to be a problem with the node binary on centos? did anyone managed to track the segfault or build node binary to check? in my situation after many attemps for different node even on the 3.8.2 version the segfault 23d0 error occurs so. moved to...

the temporary solution with snap works fine except one - the snap version is lower than actual ones - what is the delay in snap versions vs general?

alireza-salehi commented 2 years ago

Was anyone brave enough to test 4.1.2 already?

4.1.1 had same issue

ali-alhaidary commented 2 years ago

upgrading from 4.1.1 to 4.1.2:

npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142 npm WARN deprecated har-validator@5.1.5: this library is no longer supported npm WARN deprecated uuid@3.4.0: Please upgrade to version 7 or higher. Older versions may use Math.random() in certain circumstances, which is known to be problematic. See https://v8.dev/blog/math-random for details. npm WARN deprecated node-pre-gyp@0.14.0: Please upgrade to @mapbox/node-pre-gyp: the non-scoped node-pre-gyp package is deprecated and only the @mapbox scoped package will recieve updates in the future

fibers@4.0.3 install /tmp/bundle/programs/server/node_modules/fibers node build.js || nodejs build.js

linux-x64-72-glibc exists; testing Binary is fine; exiting npm WARN lifecycle meteor-dev-bundle@~install: cannot run in wd meteor-dev-bundle@ node npm-rebuild.js (wd=/tmp/bundle/programs/server) added 147 packages from 122 contributors and audited 148 packages in 5.109s

3 packages are looking for funding run npm fund for details

found 1 high severity vulnerability run npm audit fix to fix them, or npm audit for details

╭───────────────────────────────────────────────────────────────╮ │ │ │ New major version of npm available! 6.14.12 → 8.1.3 │ │ Changelog: https://github.com/npm/cli/releases/tag/v8.1.3 │ │ Run npm install -g npm to update! │ │ │ ╰───────────────────────────────────────────────────────────────╯

Also, localization was not pulled from lingohub...

ali-alhaidary commented 2 years ago

Also, https://github.com/RocketChat/Rocket.Chat/issues/23599 is still there not resolved...

RC685 commented 2 years ago

Still no updates on this issue? :(

Might be worth throwing something in the update instructions on the website about the CentOS7 issue if it'll be a while.

TheWrongGuy commented 2 years ago

Are there any updates on this?

iamfasal commented 2 years ago

@debdutdeb Any update on this case?

tassoevan commented 2 years ago

@debdutdeb Any update on this case?

~It's under investigation.~

Edit: I've found the root cause. Working on a fix.

mojitaleghani commented 2 years ago

Hi everyone, I test on the lately released version right now (https://github.com/RocketChat/Rocket.Chat/releases/tag/4.2.0) which released yesterday and still faced the same error:

Dec 01 08:00:19 $$MY_DOMAIN kernel: node[21953]: segfault at 23d0 ip 00000000000023d0 sp 00007f1f9b2a33b8 error 14 in node[400000+26b5000]

are there anyone with a solution?

sampaiodiego commented 2 years ago

can some of you please help testing this PR https://github.com/RocketChat/Rocket.Chat/pull/23796 ?

iamfasal commented 2 years ago

@sampaiodiego I've tested this PR and got a different error on startup:

Dec 02 10:22:35 3-110-30-139.cprapid.com systemd[1]: Started The Rocket.Chat server.
Dec 02 10:22:35 3-110-30-139.cprapid.com rocketchat[4302]: internal/modules/cjs/loader.js:818
Dec 02 10:22:35 3-110-30-139.cprapid.com rocketchat[4302]: throw err;
Dec 02 10:22:35 3-110-30-139.cprapid.com rocketchat[4302]: ^
Dec 02 10:22:35 3-110-30-139.cprapid.com rocketchat[4302]: Error: Cannot find module 'reify/lib/runtime'
Dec 02 10:22:35 3-110-30-139.cprapid.com rocketchat[4302]: Require stack:
Dec 02 10:22:35 3-110-30-139.cprapid.com rocketchat[4302]: - /opt/Rocket.Chat/programs/server/runtime.js
Dec 02 10:22:35 3-110-30-139.cprapid.com systemd[1]: rocketchat.service: main process exited, code=exited, status=1/FAILURE
Dec 02 10:22:35 3-110-30-139.cprapid.com systemd[1]: Unit rocketchat.service entered failed state.

Steps followed:

336 packages are looking for funding run npm fund for details

found 78 vulnerabilities (5 low, 27 moderate, 40 high, 6 critical) run npm audit fix to fix them, or npm audit for details


- Started both mongod and rocketchat service back

- It went to failed status immediately after start and got above attached errors in journalctl output.

Not sure if I did the steps wrong, if so please advise or review this case further.

Thanks 
debdutdeb commented 2 years ago

@fasalsh are you using the root user (assuming from the no-sudo /opt handling)? If so please run npm install --unsafe-perm instead of just npm install.

poandlsl commented 2 years ago

@debdutdeb I have tested the PR using the same steps as @fasalsh and get the same error upon starting the service. I am running npm install as a non-root user.

iamfasal commented 2 years ago

@fasalsh are you using the root user (assuming from the no-sudo /opt handling)? If so please run npm install --unsafe-perm instead of just npm install.

@debdutdeb I actually ran it with sudo (forgot to mention it). It was throwing the error mentioned in my previous message on journalctl upon service start (even @poandlsl got the same error) so something is still not correct I'd say.

tassoevan commented 2 years ago

There are multiple node_modules directories in the final build. If I remember well, the one containing sharp was at programs/server/npm. I don't know how to test it in isolation without making a fresh Meteor build.

iamfasal commented 2 years ago

Any update on this case? @sampaiodiego @debdutdeb

lestercoyoyjr commented 2 years ago

Hello Team,

I was suffering a lot to install Rocketchat, but I discover that what it fails is the node js version, so I changed from this:

sudo apt-get -y update && sudo apt-get install -y curl && curl -sL https://deb.nodesource.com/setup_12.x | sudo bash -

to this:

sudo apt-get -y update && sudo apt-get install -y curl && curl -sL https://deb.nodesource.com/setup_14.x | sudo bash -

The error was in the Nodejs version. I also checked that I could download the 14.18.1 version here:

sudo npm install -g inherits n && sudo n 14.18.1

That's how it was solved

whitetiger264 commented 2 years ago

@Gummikavalier This problem seems to be back again, tried upgrading from 4.7.4 to 4.8.2 today and getting a similar segmentation fault on CentOS 7 again:

Jul 22 16:43:14 core systemd: rocketchat.service: main process exited, code=killed, status=11/SEGV
Jul 22 16:43:14 core systemd: Unit rocketchat.service entered failed state.
Jul 22 16:43:14 core systemd: rocketchat.service failed.
[root@core ~]# 
[root@core server]# node /opt/Rocket.Chat/main.js
/opt/Rocket.Chat/programs/server/node_modules/fibers/fibers.js:92
                    return fn.apply(this, arguments);
                              ^

Error: Must pass options.rootUrl or set ROOT_URL in the server environment
    at Object.Meteor.absoluteUrl (packages/meteor.js:1412:11)
    at runWebAppServer (packages/webapp/webapp_server.js:996:48)
    at packages/webapp/webapp_server.js:1494:1
    at module (packages/webapp/webapp_server.js:1494:16)
    at fileEvaluate (packages/modules-runtime.js:336:7)
    at Module.require (packages/modules-runtime.js:238:14)
    at require (packages/modules-runtime.js:258:21)
    at /opt/Rocket.Chat/programs/server/packages/webapp.js:1963:15
    at /opt/Rocket.Chat/programs/server/packages/webapp.js:1972:3
    at /opt/Rocket.Chat/programs/server/boot.js:401:38
    at Array.forEach (<anonymous>)
    at /opt/Rocket.Chat/programs/server/boot.js:226:21
    at /opt/Rocket.Chat/programs/server/boot.js:464:7
    at Function.run (/opt/Rocket.Chat/programs/server/profile.js:280:14)
    at /opt/Rocket.Chat/programs/server/boot.js:463:13
[root@core server]#
[root@core ~]# /usr/local/bin/node /opt/Rocket.Chat/main.js
Segmentation fault
[root@core ~]#