Unitech / pm2

Node.js Production Process Manager with a built-in Load Balancer.
https://pm2.keymetrics.io/docs/usage/quick-start/
Other
41.37k stars 2.61k forks source link

Too Many Unstable Restarts #659

Closed 58bits closed 9 years ago

58bits commented 10 years ago

Hi all - I have a node server which runs fine under forever, and will 'start' under PM2, however, as soon as the first request hits the server - the PM2 errors out with the message...

Script index.js had too many unstable restarts (15). Stopped. "errored"

I'm starting the server in 'forked' mode (under node v0.10.31) with the following...

NODE_ENV=production pm2 start index.js -xn 'myapp'

The error log file contains...

node: ../deps/uv/src/unix/core.c:701: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

The only thing that's a little different about this server, is that it's listening on two ports (there are two node createServer commands each on a different port - one for api access and the other for gui access)

Any ideas?

soyuka commented 10 years ago
node: ../deps/uv/src/unix/core.c:701: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.

Are you using nvm or n to install pm2? I think it's due to a bad node version from the modules. Please try:

pm2 kill
rm -rf node_modules
npm i
pm2 start index.js -xn 'myapp'
58bits commented 10 years ago

hi @soyuka - we do a fresh install and npm install for each deployment - but just to be sure - I've followed your suggestion above - and am still getting the exact same error :-(

soyuka commented 10 years ago

Weird, I just tried with 10.31 and I have no issues (https://gist.github.com/soyuka/5a4ab8adf3db528f9bba). Please take a look at these related issues: https://github.com/joyent/libuv/issues/838.

Could you try 11.13 to see if the error still occurs? Thx.

58bits commented 10 years ago

okay - thanks for the reply @soyuka - i've run out of time for today - but will try again over the next few ;-)

58bits commented 10 years ago

I've also just tried it with your gist - and it works. So there's something up with our app. I can close this if you like, and then open again when I've worked out what it is. Any suggestions as to what I might be able to do to trace the error?

soyuka commented 10 years ago

Those might be really hard to trace but I'd suggest a clean environment where you install a clean node.

Does the app starts with node my-app.js?

58bits commented 10 years ago

Yup - it's running fine with node index.js as well as with forever...

NODE_ENV=production forever start --uid 'myapp' index.js

soyuka commented 10 years ago

If you can give me a way to reproduce the issue I'd be glad to help but right know I have not enough informations :/.

58bits commented 10 years ago

i'll try a few more things over the weekend. the architecture I'm using is based mainly on this app - https://github.com/hueniverse/postmile - and it also works fine, so I'll start to tear things down over the weekend and see if I can isolate what we're doing that's different.

58bits commented 10 years ago

Found it - for production and staging environments - I turned off console logging in the application - and everything is fine. stdout would have been coming from two sources (each of the two node servers). I'm not sure what that meant internally for pm2 - but turning off the console logger solved it :-) (and I still have regular file logging working fine).

ggoodman commented 10 years ago

I also have this issue, also with hapi (likely the good plugin). I speculate that the way pm2 is watching stdout and stderr is exposing some sort of libuv bug. Thoughts?

RobertWHurst commented 9 years ago

I had this problem as well. My work around involved removing a call to process.stdin.resume();. using process.stdout doesn't seem do break anything as I can use console.log(...) without a problem.

Unitech commented 9 years ago

Cluster log stream: https://github.com/Unitech/PM2/blob/master/lib/ProcessContainer.js#L114 Fork log stream: https://github.com/Unitech/PM2/blob/master/lib/God/ForkMode.js#L96

danecando commented 9 years ago

I ran into the same issue deploying a Hapi based app.

Disabling the Good plugin solved the error for me.

soyuka commented 9 years ago

Which plugin was it?

Le 9 oct. 2014 à 15:20, Dane Grant notifications@github.com a écrit :

I ran into the same issue deploying a Hapi based app.

Disabling the Good plugin solved the error for me.

— Reply to this email directly or view it on GitHub.

58bits commented 9 years ago

https://github.com/hapijs/good - same for me - although I solved the problem by disabling console output (while keeping the file logger).

58bits commented 9 years ago

Not sure if tagging the Good maintainer here works - but just in case... @lloydbenson

chriswiggins commented 9 years ago

I'm having the same issue - also using the hapijs good plugin. Its a really good logging utility so would be great if we can figure out how to fix this (besides disabling the console output!)

:+1:

tagrudev commented 9 years ago

Yes having the same issue - default expressjs.

Reproducing the issue:

express test1
cd test1
npm install
pm2 start app.js

This one comes from the log - pm2 logs app

[PM2] Script /home/user/test1/app.js had too many unstable restarts (15). Stopped. "errored"

PM2 version

user@user$ pm2 -v
Starting PM2 daemon...
0.11.0

Node version

user@user$ node -v
v0.10.24
arb commented 9 years ago

Just a thought here, but is it possible that it had something to do with the colorized console output? I know that Express colorizes for different requests and good added colorized console logging here.

wraithgar commented 9 years ago

I have several apps that use colorized output that do not cause this error, only the one using the Good.GoodConsole did.

chriswiggins commented 9 years ago

@58bits, I just tried disabling the Good.GoodConsole reporter (leaving just the file reporter) but this still happens for me. What version of PM2 are you running, and what version of Good? If we can narrow it down potentially we'll find the difference and then in turn find the issue.

This is a real hold up for me as we want to use the Good Reporting utility, but also want to use PM2!

jingchan commented 9 years ago

To add to this discussion, I'm running into this issue with good@3.1.1, but it appears to works fine with good@2.1.2.

chriswiggins commented 9 years ago

@jingchan, this is good to know. Unfortunately, they completely re-architected the internals between those versions (as you'd expect with major version number changes), so its hard to compare in GitHub

chriswiggins commented 9 years ago

OK guys, I have "fixed" the problem, but I'm not passing all the tests. This error goes away if we change the FD0 pipe in ForkMode from "ipc" to "ignore". See commit d44ff7d0b982d8c3330c679a1d442459d8ed745e on chriswiggins/pm2#development

Why is this? Any ideas?

chriswiggins commented 9 years ago

In addition to above, after testing, I have never used IPC before but disabling it causes all sorts of errors as a process running under PM2 is unable to do a process.send(). Maybe file descriptors are being assigned in random ways. This is way over my head :-)

soyuka commented 9 years ago

I think it should be ipc to enable signals see child_process stdio options. You could give it a file descriptor but it'd complicate things.

58bits commented 9 years ago

Hi @chriswiggins - thanks a bunch for this - and sorry I didn't get back to you sooner. An avalanche of work at the moment. :-(

chriswiggins commented 9 years ago

Have opened an issue with a test case over at hapijs/good#236. Hopefully we can get to the bottom of this

chriswiggins commented 9 years ago

Ok team. Two outcomes:

Hapijs/good have tidied up a part of the plugin implementation, in the process causing this error to go away. What causes the error to occur is trying to access process.stdin when the process is a child and the stdin file descriptor is linked to an IPC channel. This leads to the second problem:

PM2 shouldn't really attach IPC to FD0 in forkmode, but I've tried making it the fourth descriptor (as IPC is obviously needed) but some of the tests just hang. The other alternative is to not use spawn, but the actual fork method which does the correct IPC stuff by default. I'll keep inspecting.

Is there a PM2 IRC channel or similar? Would love to discuss some of this stuff in real time! Maybe even using Gitter would be good?

soyuka commented 9 years ago

The other alternative is to not use spawn, but the actual fork method which does the correct IPC stuff by default.

This looks interesting, any thoughts @jshkurti ?

By sending child.kill(signal) we could easily send basic signals. process.send() is used to communicate with them in a more advanced use. I agree that fd0 should be kept as a basic stdin, easier to use with other interpreters.

Basically fork uses spawn with these settings:

  // Leave stdin open for the IPC channel. stdout and stderr should be the
  // same as the parent's if silent isn't set.
  options.stdio = options.silent ? ['pipe', 'pipe', 'pipe', 'ipc'] :
      [0, 1, 2, 'ipc'];

https://github.com/joyent/node/blob/master/lib/child_process.js#L572

Using the fd4 should be working just fine ;)

Gitter: https://gitter.im/Unitech/PM2

58bits commented 9 years ago

I can confirm that in the latest version of the good plugin, (and latest pm2) this problem has been resolved. As much as I'd like to participate in the discussion, this is way above me. :-). Should I go ahead and close this issue?

chriswiggins commented 9 years ago

Yes please @58bits. Glad we could solve it as now I've got what I want working too. Power of the community!

chriswiggins commented 9 years ago

Actually, leave it open. We'll close when we pull in the FD4 fix.

58bits commented 9 years ago

w00t!

rafahoro commented 9 years ago

You are right, I had the same issue in my code and solved it by commenting the a line to create a pipe with child's stdin:

var cp = spawn(SPAWN_BIN, SPAWN_ARGS, {"env": env});
cp.stdout.pipe(process.stdout);
cp.stderr.pipe(process.stderr);
// https://github.com/Unitech/PM2/issues/659
// process.stdin.pipe is causing the following error when using PM2
//node: ../deps/uv/src/unix/core.c:701: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.
//process.stdin.pipe(cp.stdin);

Thank you!!!

soyuka commented 9 years ago

It should be working now that IPC has been moved to the 4th file descriptor. Could you give it a shot?

wraithgar commented 9 years ago

This is now working for me using: good@4.0.0, pm2@0.12.1, node@v0.10.33

rafahoro commented 9 years ago

Sorry, it's a big project and I can't change software/library versions yet. Hopefully in the near future

soyuka commented 9 years ago

ok

@wraithgar thanks!

levicoradine commented 3 years ago

Sorry to resurrect this, but I'm experiencing a seemingly similar situation:

events.js:292
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at process.target._send (internal/child_process.js:832:20)
    at process.target.send (internal/child_process.js:703:19)
    at sendHelper (internal/cluster/utils.js:27:15)
    at send (internal/cluster/child.js:199:10)
    at EventEmitter.cluster._setupWorker (internal/cluster/child.js:49:3)
    at initializeClusterIPC (internal/bootstrap/pre_execution.js:345:13)
    at prepareMainThreadExecution (internal/bootstrap/pre_execution.js:67:3)
    at internal/main/run_main_module.js:7:1
Emitted 'error' event on Worker instance at:
    at process.<anonymous> (internal/cluster/worker.js:30:12)
    at process.emit (events.js:315:20)
    at internal/child_process.js:836:39
    at processTicksAndRejections (internal/process/task_queues.js:75:11) {
  errno: -32,
  code: 'EPIPE',
  syscall: 'write'
}
2021-02-26T15:01:26: PM2 log: App [svcs-server:1] exited with code [1] via signal [SIGINT]
2021-02-26T18:25:03: PM2 log: App [foo-server:2] starting in -cluster mode-
node: ../deps/uv/src/unix/core.c:930: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.
./cmd.sh: line 39:    16 Aborted                 (core dumped) pm2-runtime "${PM2_ECOSYSTEM_FILE}"

This is for an app built into a FROM node:14-buster Docker container, where our install basically is this:

  apt-get update -y && \
  apt-get -y install mdbtools ncat && \ 
  apt-get clean && \
  npm install -g pm2 npm@7 && \
  npm install --production && \
  pm2 install pm2-logrotate && \
  pm2 set pm2-logrotate:max_size 1M && \
  pm2 set pm2-logrotate:compress true && \
  pm2 set pm2-logrotate:retain 7

Any insights would be welcome. Thanks.

olawalejuwonm commented 1 month ago

Also having this issue.

Any solution please?

Sorry to resurrect this, but I'm experiencing a seemingly similar situation:

events.js:292
      throw er; // Unhandled 'error' event
      ^

Error: write EPIPE
    at process.target._send (internal/child_process.js:832:20)
    at process.target.send (internal/child_process.js:703:19)
    at sendHelper (internal/cluster/utils.js:27:15)
    at send (internal/cluster/child.js:199:10)
    at EventEmitter.cluster._setupWorker (internal/cluster/child.js:49:3)
    at initializeClusterIPC (internal/bootstrap/pre_execution.js:345:13)
    at prepareMainThreadExecution (internal/bootstrap/pre_execution.js:67:3)
    at internal/main/run_main_module.js:7:1
Emitted 'error' event on Worker instance at:
    at process.<anonymous> (internal/cluster/worker.js:30:12)
    at process.emit (events.js:315:20)
    at internal/child_process.js:836:39
    at processTicksAndRejections (internal/process/task_queues.js:75:11) {
  errno: -32,
  code: 'EPIPE',
  syscall: 'write'
}
2021-02-26T15:01:26: PM2 log: App [svcs-server:1] exited with code [1] via signal [SIGINT]
2021-02-26T18:25:03: PM2 log: App [foo-server:2] starting in -cluster mode-
node: ../deps/uv/src/unix/core.c:930: uv__io_stop: Assertion `loop->watchers[w->fd] == w' failed.
./cmd.sh: line 39:    16 Aborted                 (core dumped) pm2-runtime "${PM2_ECOSYSTEM_FILE}"

This is for an app built into a FROM node:14-buster Docker container, where our install basically is this:

  apt-get update -y && \
  apt-get -y install mdbtools ncat && \ 
  apt-get clean && \
  npm install -g pm2 npm@7 && \
  npm install --production && \
  pm2 install pm2-logrotate && \
  pm2 set pm2-logrotate:max_size 1M && \
  pm2 set pm2-logrotate:compress true && \
  pm2 set pm2-logrotate:retain 7

Any insights would be welcome. Thanks.