RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.4k stars 10.51k forks source link

RC 4.0.0 node upgrade fails with segfault error #23346

Closed Gummikavalier closed 2 years ago

Gummikavalier commented 3 years ago

Description:

Starting RC node fails with segfault error after upgrading from RC 3.18.1 to RC 4.0.0.

Steps to reproduce:

  1. Stop all your running nodes
  2. Download RC 4.0.0 tar package with curl -L https://releases.rocket.chat/latest/download -o /tmp/rocket.chat.tgz
  3. Check that all dependencies are correct on your server
  4. Install the RC 4.0.0 node binaries with cd /tmp/bundle/programs/server && npm install
  5. Replace your first node with the new RC node
  6. Start the first updated node

Expected behavior:

Node starts and schema upgrade process begins

Actual behavior:

The new node fails to start with a segfault error.

Oct 03 11:00:22 server.example.com kernel: node[18774]: segfault at 23d0 ip 00000000000023d0 sp 00007f8303d8d3c8 error 14 in node[400000+26b5000]
Oct 03 11:00:22 server.example.com abrt-hook-ccpp[18818]: Process 18774 (node) of user 5601 killed by SIGSEGV - dumping core
Oct 03 11:00:25 server.example.com abrt-server[18827]: Executable '/usr/local/bin/node' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Oct 03 11:00:25 server.example.com abrt-server[18827]: 'post-create' on '/var/spool/abrt/ccpp-2021-10-03-11:00:22-18774' exited with 1
Oct 03 11:00:25 server.example.com abrt-server[18827]: Deleting problem directory '/var/spool/abrt/ccpp-2021-10-03-11:00:22-18774'
Oct 03 11:01:01 server.example.com systemd[1]: Created slice User Slice of root.

Server Setup Information:

mr-dandy commented 3 years ago

We've got the same problem and similar configuration.

tete2soja commented 3 years ago

Same error for me too

ymch commented 3 years ago

I get the same error and cannot start with 4.0.0, not RC4.0.0.

markus07 commented 3 years ago

The same problem with 4.0.0, upgraded from the latest available 3.18.2 Error: Oct 4 14:47:14 test kernel: node[60646]: segfault at 23d0 ip 00000000000023d0 sp 00007f1cd9abc3c8 error 14 in node[400000+26b5000]

TheWrongGuy commented 3 years ago

We also have a segfault error after upgrading from 3.18.2 to 4.0.0.

RC685 commented 3 years ago

I got the same error upgrading from 3.18.2 to 4.0.0. This was on our self-hosted CentOS 7 server.

gennaris commented 3 years ago

Same error here on a Centos 7 self-hosted - needed to downgrade...

Oct 04 16:17:20 chat kernel: node[28685]: segfault at 23d0 ip 00000000000023d0 sp 00007f927df693c8 error 14 in node[400000+267f000]

NiklasHamburgnet commented 3 years ago

Same error on self hosted CentOS 7.

tete2soja commented 3 years ago

Using back previous version (https://releases.rocket.chat/3.18.2/download) works fine.

whitetiger264 commented 3 years ago

Using back previous version (https://releases.rocket.chat/3.18.2/download) works fine.

Downgrade worked for me too. Seems upgrading on CentOS 7 self hosted is an issue.

serviceman commented 3 years ago

I am having the same issue with a fresh install on Centos and Rocket Chat 4.0.0 self hosted, manual install.

Gummikavalier commented 3 years ago

Thanks for all for confirmations! :+1:

Anyone with CentOS Stream or RockyLinux having the same issue? The segfault could be an issue with supported ciphers and dependencies on older OSes.

Gummikavalier commented 3 years ago

Tested RC 4.0.1. Still segfaults.

TheWrongGuy commented 3 years ago

Tested RC 4.0.1. Still segfaults.

Thanks for testing, so I don't have to try it myself. Was just going to ask. :D

demidovich commented 3 years ago

Same error on self hosted CentOS 7.

josephcrowell commented 3 years ago

Yeah no 4.x releases are working properly yet.

exequos commented 3 years ago

Thanks for all for confirmations! 👍

Anyone with CentOS Stream or RockyLinux having the same issue? The segfault could be an issue with supported ciphers and dependencies on older OSes.

Rocket.Chat 4.0.1 (upgrade from 3.18.2) fail too with same issue (segfault at 23d0) on CentOS 7.8.2003

Is there any solution for it?

Gummikavalier commented 3 years ago

Rocket.Chat 4.0.1 (upgrade from 3.18.2) fail too with same issue (segfault at 23d0) on CentOS 7.8.2003

Is there any solution for it?

Thanks for confirming. No solutions yet.

amsnek commented 3 years ago

Same issue here, upgrade from 3.18.1 -> 4.0.1 rhel7 /node-v12.22.5

sgocken commented 3 years ago

Can we get a note on the 4.0.0 and 4.0.1 releases warning of this issue for CentOS 7

Gummikavalier commented 3 years ago

RC 4.0.2 segfaults still

These may or may not be related to the issue:

npm WARN deprecated har-validator@5.1.5: this library is no longer supported
npm WARN deprecated uuid@3.4.0: Please upgrade  to version 7 or higher.  Older versions may use Math.random() in certain circumstances, which is known to be problematic.  See https://v8.dev/blog/math-random for details.
npm WARN deprecated request@2.88.2: request has been deprecated, see https://github.com/request/request/issues/3142
npm WARN deprecated node-pre-gyp@0.14.0: Please upgrade to @mapbox/node-pre-gyp: the non-scoped node-pre-gyp package is deprecated and only the @mapbox scoped package will recieve updates in the future

I think it would be a good idea to peek at the container version of RC4.0 and compare what we have with CentOS7 and its repositories. I'm too busy to do that for a week or two, so I have no choice than waiting for the devs to say something about the issue.

iamfasal commented 3 years ago

Having the same segfault error on 4.0.2 update.

OS: CentOS Linux release 7.9.2009 (Core) Previous RC version: 3.8.1 Updated to: 4.0.2

The versions below 4.0 seems working fine, so could be a bug in latest release.

node -v

v12.22.1

npm -v

6.14.1

Node and NPM matched as per https://github.com/RocketChat/Rocket.Chat/releases

The service getting killed automatically on startup. The error in kernel:

Oct 15 11:29:51 server.demo.com kernel: node[6071]: segfault at 23d0 ip 00000000000023d0 sp 00007ff0dd01a3c8 error 14 in node[400000+26b5000]

Any solution for this problem in CentOS 7?

iamfasal commented 3 years ago

@sampaiodiego Any thoughts on this as many users are affected already

debdutdeb commented 3 years ago

Hi.

Please try this

(
  cd /opt/Rocket.Chat/programs/server/
  rm -rf npm/node_modules/sharp/vendor
  NODE_ENV=production npm i
  cd ../.. && node main.js
)

I'll update the documentation in a bit.

Gummikavalier commented 3 years ago

@debdutdeb Hi, I tried that now (with RC 4.0.2) and get now

$  cd /opt/rocket.chat && node main.js 

Something went wrong installing the "sharp" module

libvips-cpp.so.42: cannot open shared object file: No such file or directory

- Remove the "node_modules/sharp" directory, run "npm install" and look for errors
- Consult the installation documentation at https://sharp.pixelplumbing.com/en/stable/install/
- Search for this error at https://github.com/lovell/sharp/issues

I'm probably doing something wrong, but I tried couple of different combinations too. I'll wait for the updated docs, unless I figure it out earlier... :)

debdutdeb commented 3 years ago

You need to rebuild the module, i.e. cd programs/server && npm i

debdutdeb commented 3 years ago

I see what I did wrong there, updated the previous comment.

EDIT: made an error again, check now @gummikavalier

Gummikavalier commented 3 years ago

Thanks, I'll test

Gummikavalier commented 3 years ago

@debdutdeb Whole process I tried now:

Ensure the correct nodejs version:
# node -v
v12.22.1

Ensure the correct NPM version:
# npm install -g npm@6.14.1
# npm cache clean -f

As root remove the old RC path and move untarred RC 4.0.2 bundle to /opt/rocket.chat:
# rm -rf /opt/rocket.chat
# mv /root/bundle /opt/rocket.chat && chown -R rocket:rocket /opt/rocket.chat && chmod -R 770 /opt/rocket.chat && chown -R rocket:rocket /tmp/ufs

Su to rocket account:
# su - rocket

Then first export all the usual working ENV settings (values redacted in below but the ones that worked with RC 3.18.2):
export PORT=
export ROOT_URL=
export MONGO_URL=
export MONGO_OPLOG_URL=
export MAIL_URL=

Then run:
cd /opt/rocket.chat/programs/server/
rm -rf npm/node_modules/sharp/vendor
NODE_ENV=production npm i
cd ../.. && node main.js

Result:
Segmentation fault (core dumped)
debdutdeb commented 3 years ago

Alright, installing centos7 vm. Will write back soon. Thanks

AndersonOuverney commented 3 years ago

Hello everybody. I'm new and I've never used rocket.chat yet and I'm trying to install it on a centos 7.

I went through the installation process following the guide, I did everything exactly as it is in the documentation about 10 times, always starting from a fresh installation, to make sure I'm not asking for anything.

On the last try I got new errors and I came to this place where people are encountering the exact same problem as me.

I would like to know from friends if there is already something we can do to upload the installation to the centos 7.

Thanks

debdutdeb commented 3 years ago

Jfyi - I'm still investigating the issue. In the meantime, those who are new and trying to install for the first time, you do have other options like docker & snap.

I personally would recommend docker. But snap is a decent option as well. Just install snapd on centos7 (https://snapcraft.io/docs/installing-snap-on-centos) and run

sudo snap install rocketchat-server

RC685 commented 3 years ago

Any updates on this? Didn't see anything about it in the 4.0.4 release notes so I assume it's still an issue.

Gummikavalier commented 3 years ago

Tested RC 4.0.4 so others don't have to. Still segfaults.

MysiaginAV commented 3 years ago

I have the same error when upgrading to version 4.0. Waiting for fix

CH40S734D3R commented 3 years ago

Same error here, a fix is really needed for those who use CentOS 7. Using snap all works 100%, but I need access to the files for some small changes, so I will wait for a fix until then.

Gummikavalier commented 3 years ago

We did some comparing of docker image and tar package, and the differences are so small that it may very well be that the issue is directly at the node binary level. Typically these would be caused by out of date dependencies for crypto or libc.

I've started building a new RHEL8 based server to move our test service config and database to more up to date platform. I'll inform in the comments here how RC 4.0 works on RHEL8.

CH40S734D3R commented 3 years ago

We really need a fix, for example, I’m running services in CentOS 7, I can’t migrate/upgrade to CentOS 8 because some incompatibility. @debdutdeb any news about this?

spacegrenade commented 2 years ago

Just tried this on 4.1.0 and same issue

iniOr commented 2 years ago

Tested on version 4.0.5 same environment than version 3.x and the service crash with a segfault error :

Oct 28 09:35:18 <redacted> kernel: node[3369]: segfault at 23d0 ip 00000000000023d0 sp 00007ffb838d5308 error 14 in node[400000+267f000]

Oct 28 09:35:18 <redacted> abrt-hook-ccpp[3411]: Process 3369 (node) of user 1000 killed by SIGSEGV - dumping core
Oct 28 09:35:24 <redacted> systemd[1]: rocketchat.service: main process exited, code=dumped, status=11/SEGV
Oct 28 09:35:24 <redacted> systemd[1]: Unit rocketchat.service entered failed state.
Oct 28 09:35:24 <redacted> systemd[1]: rocketchat.service failed.
Gummikavalier commented 2 years ago

I got my test done, and for those who have the luxury of upgrading their OS, RC 4.0.5 does not segfault in RHEL8.

RC685 commented 2 years ago

@Gummikavalier Thanks!

Any ideas on what the issue is in 7? Myself (and probably a lot of other admins of CentOS servers) aren't too keen on upgrading to CentOS 8 since it's going EOL at the end of this year (whereas CentOS 7 will still be supported until 2024).

CH40S734D3R commented 2 years ago

Anyone tested recent 4.1.0 update? I checked the changelog but I didn't see any mention/fix about this issue.

debdutdeb commented 2 years ago

Hey this looks like an issue at the node binary level :( I haven't been able to figure out the exact source.

A possible (but not ideal) workaround would be to use the node docker image and use an alias since moving fully to docker isn't possible for many of you -

alias node='docker run --rm -e ROOT_URL -e MONGO_URL -e MONGO_OPLOG_URL -e PORT -v $(pwd):/mnt -w /mnt --network host node:12'
alias npm='docker run --rm -v $(pwd):/mnt -w /mnt node:12 npm'
Gummikavalier commented 2 years ago

@Gummikavalier Thanks!

Any ideas on what the issue is in 7? Myself (and probably a lot of other admins of CentOS servers) aren't too keen on upgrading to CentOS 8 since it's going EOL at the end of this year (whereas CentOS 7 will still be supported until 2024).

@RC685 I too think that the issue is the node binary. Maybe building node from the sources might help, or not, as the final root cause would likely to be in the libraries it depends on. So your best bet is what debdutdeb suggests above.

But personally I'm moving to RHEL 8 now. I'd move to CentOS Stream or RockyLinux if RHEL8 wasn't available for me.

iamfasal commented 2 years ago

I see a similar segfault error happened before with RocketChat which is discussed on https://github.com/nodejs/node/issues/19274

I'm still surprised no RocketChat officials addressed this case, and no official fix or solution published yet. The solution @debdutdeb advised using a Docker container for node & npm is not a permanent solution I'd say. Since CentOS 7 EOL extended until 2024, no way we could move the system to CentOS 8 which hits EOL this December and not to CentOS Stream which is a downstream of RHEL going forward. AlmaLinux/RockyLinux is good compared to CentOS 8 or CentOS Stream but we have no way to move to any of these at this stage.

If this is a node binary related error, any chance the higher v12.22 versions would resolved it? I don't see anything related on node package release descriptions re this matter so not sure about it though.

Any permanent solution would be helpful which works with CentOS 7. @sampaiodiego

debdutdeb commented 2 years ago

@fasalsh you're right - the docker solution is definitely not a permanent one. Diego and the engineering team's been busy with the 4.x releases and all the changes coming along with it.

As for me I'm also working on multiple verticals at the same time regarding the latest release as well.

These are the sorts of issues that docker / related technologies are there to solve. Which is why we recommend everyone to implement those solutions to begin with.

That being said nobody's forgot about this issue :) I assure you. @sampaiodiego noticed and notified me about this himself fwiw. My today's already quite stuffed - so I'll dig deeper next week. If this is an issue on the node binary level - 1) there's little for us at rocketchat to do any way 2) it'll take quite a while to get this fixed (which is again not under our control). Thus the workaround of using the node docker image if any of you can't move to docker/k8s for the time being.

You can also use node snap if you don't want to install and configure docker on all those machines - sudo snap install node --channel=12/stable

Or, if you're both against snap and docker usage (I've come across such situations), you can download a minimal ubuntu rootfs archive (since we know node + rc is working fine in ubuntu) and run node through chroot - very similar to using docker tbh.

Again - the above are just temporary workarounds.

iamfasal commented 2 years ago

@debdutdeb Thanks for the details, and looking forward to a permanent solution for this issue from RocketChat side

alireza-salehi commented 2 years ago

version 4.1.1 CentOS Linux 7.9.2009 64bit / Linux 3.10.0-1160.45.1.el7.x86_64 npm 6.14.12 node v12.22.1 mongod v4.0.27

segmentation fault when running node main.js

TheWrongGuy commented 2 years ago

Was anyone brave enough to test 4.1.2 already?