eggjs / egg

🥚 Born to build better enterprise frameworks and apps with Node.js & Koa
https://eggjs.org
MIT License
18.88k stars 1.81k forks source link

每次修改文件重启报这么一堆错误 #432

Closed zlab closed 7 years ago

zlab commented 7 years ago

每次修改文件重启报这么一堆错误

017-02-22 15:39:37,488 WARN 184300 [master] App Worker#2:183904 started at 7001, remain 0 (15314ms)
2017-02-22 15:39:37,489 INFO 184300 [master] App Worker#1:170060 disconnect, suicide: true, state: disconnected, current workers: ["2"]
[Wed Feb 22 2017 15:39:37 GMT+0800 (中国标准时间)] [cfork:master:184300] worker:170060 disconnect (suicide: true, state: disconnected, isDead: false)
[Wed Feb 22 2017 15:39:37 GMT+0800 (中国标准时间)] [cfork:master:184300] don't fork new work (refork: false)
2017-02-22 15:39:37,493 ERROR 183332 nodejs.unhandledExceptionError: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TCP.onread (net.js:569:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'unhandledExceptionError'
pid: 183332
hostname: zhanqi-pc

[Wed Feb 22 2017 15:39:37 GMT+0800 (中国标准时间)][pid: 183332][Leader] Error: read ECONNRESET 
Error Stack:
  Error: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TCP.onread (net.js:569:26)
Error Additions:
  code: "ECONNRESET"
  errno: "ECONNRESET"
  syscall: "read"

2017-02-22 15:39:37,494 ERROR 183332 nodejs.ECONNRESETError: read ECONNRESET
    at exports._errnoException (util.js:1022:11)
    at TCP.onread (net.js:569:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 183332
hostname: zhanqi-pc

[Wed Feb 22 2017 15:39:37 GMT+0800 (中国标准时间)] [cfork:master:184300] worker:170060 exit (code: null, suicide: true, state: dead, isDead: true, isExpected: true)
dead-horse commented 7 years ago

可以更详细的描述一下? 代码是否写错了?这里报了一个 ECONNRESET 的错误,是不是在请求其他服务?

dead-horse commented 7 years ago

这是因为你在开发环境写出了未捕获异常,导致进程退出。

zlab commented 7 years ago

App Worker#4:300960 disconnect, suicide: true, state: disconnected, current workers: ["5"] 每次重启,都会产生一个新的worker,之前的worker is dead worker:300960 disconnect (suicide: true, state: disconnected, isDead: false)

atian25 commented 7 years ago

@zlab 你最后这一段是正常的,开发期重启 worker。

但上面的 read ECONNRESET 就要查查,你能不能用 egg-init --type simple 初始化一个项目,然后最小化代码复现下这个问题。

AnzerWall commented 7 years ago

报错我这边也是会出现的。但是我没去管他。。 环境win10,webstorm,egg-init生成的项目

AnzerWall commented 7 years ago
"C:\Program Files (x86)\JetBrains\WebStorm 2016.3.3\bin\runnerw.exe" "C:\Program Files\nodejs\node.exe" "C:\Program Files\nodejs\node_modules\npm\bin\npm-cli.js" run dev

> config-api@1.0.0 dev F:\test
> egg-bin dev

2017-02-28 13:44:25,924 INFO 25088 [master] =================== egg start =====================
2017-02-28 13:44:25,928 INFO 25088 [master] egg version 1.0.0-rc.1
2017-02-28 13:44:25,929 INFO 25088 [master] start with options: {"customEgg":"F:\\test\\node_modules\\egg","baseDir":"F:\\test","port":7001,"workers":1,"plugins":null,"https":false,"key":"","cert":""}
2017-02-28 13:44:25,929 INFO 25088 [master] start with env: isProduction: false, isDebug: false, EGG_SERVER_ENV: undefined, NODE_ENV: development
2017-02-28 13:44:25,974 INFO 25088 [master] Agent Worker:10188 start with ["{\"customEgg\":\"F:\\\\test\\\\node_modules\\\\egg\",\"baseDir\":\"F:\\\\test\",\"port\":7001,\"workers\":1,\"plugins\":null,\"https\":false,\"key\":\"\",\"cert\":\"\",\"clusterPort\":52137}"]
2017-02-28 13:44:26,847 INFO 25088 [master] Agent Worker started (910ms)
2017-02-28 13:44:26,860 INFO 25088 [master] App Worker#1:9832 start, state: none, current workers: ["1"]
2017-02-28 13:44:27,843 WARN 25088 [master] App Worker#1:9832 started at 7001, remain 0 (995ms)
2017-02-28 13:44:27,843 INFO 25088 [master] egg started on http://127.0.0.1:7001 (1914ms)
2017-02-28 13:48:34,400 WARN 10188 [agent:development] reload worker because F:\test\app\controller\home.js change
2017-02-28 13:48:34,401 INFO 25088 [master] reload workers...
2017-02-28 13:48:34,426 INFO 25088 [master] App Worker#2:8372 start, state: none, current workers: ["1","2"]
2017-02-28 13:48:35,182 WARN 25088 [master] App Worker#2:8372 started at 7001, remain 0 (248334ms)
2017-02-28 13:48:35,186 INFO 25088 [master] App Worker#1:9832 disconnect, suicide: true, state: disconnected, current workers: ["2"]
[Tue Feb 28 2017 13:48:35 GMT+0800 (中国标准时间)] [cfork:master:25088] worker:9832 disconnect (suicide: true, state: disconnected, isDead: false)
[Tue Feb 28 2017 13:48:35 GMT+0800 (中国标准时间)] [cfork:master:25088] don't fork new work (refork: false)
(node:25088) DeprecationWarning: worker.suicide is deprecated. Please use worker.exitedAfterDisconnect.
2017-02-28 13:48:35,191 ERROR 10188 nodejs.unhandledExceptionError: read ECONNRESET
    at exports._errnoException (util.js:1028:11)
    at TCP.onread (net.js:572:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'unhandledExceptionError'
pid: 10188
hostname: USHIO

[Tue Feb 28 2017 13:48:35 GMT+0800 (中国标准时间)][pid: 10188][Leader] Error: read ECONNRESET 
Error Stack:
  Error: read ECONNRESET
    at exports._errnoException (util.js:1028:11)
    at TCP.onread (net.js:572:26)
Error Additions:
  code: "ECONNRESET"
  errno: "ECONNRESET"
  syscall: "read"

2017-02-28 13:48:35,193 ERROR 10188 nodejs.ECONNRESETError: read ECONNRESET
    at exports._errnoException (util.js:1028:11)
    at TCP.onread (net.js:572:26)
code: 'ECONNRESET'
errno: 'ECONNRESET'
syscall: 'read'
name: 'ECONNRESETError'
pid: 10188
hostname: USHIO

[Tue Feb 28 2017 13:48:35 GMT+0800 (中国标准时间)] [cfork:master:25088] worker:9832 exit (code: null, suicide: true, state: dead, isDead: true, isExpected: true)

image

系统:window 10 node: v7.6.0 代码:egg-init simple 还需要哪些信息

应该与webstorm无关,在命令行下通过sublime之类的修改依然会触发。

atian25 commented 7 years ago

等 @shaoshuai0102 从巴厘岛回来后再看看 ^_^

shaoshuai0102 commented 7 years ago

不好意思 休假几天。

有使用什么插件么?我这里没有成功复现。

shaoshuai0102 commented 7 years ago

只有 windows 下有?

AnzerWall commented 7 years ago

@shaoshuai0102 在linux上,我还没投入使用。现在只有在windows上遇到,没有其他插件,用的egg-init生成的项目,没有做改动。

shaoshuai0102 commented 7 years ago

win 下没成功复现 谁那里有 windows 机器也帮忙试着复现下吧

spiroo commented 7 years ago

我也出现这个问题了,就用egg-init初始化了一个项目,只添加了nunjucks模板,还没做任何修改 image

dead-horse commented 7 years ago

是否会与 cluster-client 有关系?

atian25 commented 7 years ago

Leader 这个应该就是 cluster-client 的吧

popomore commented 7 years ago

主要是 watcher 用了 cluster-client

shaoshuai0102 commented 7 years ago

我跟 找找windows机器去

shaoshuai0102 commented 7 years ago

我找了台 win 实体机稳定重现。

development 插件在文件改动时,先 fork 一个新 worker 进程然后 kill 原来的进程,kill 的方式是发送 sigterm 信号。按这个方式来看,应该必定会出现 leader read 报错的。我现在的疑问反而是为啥 mac 下没有。。。

现在想的思路是在 worker 里捕获 process.on('SIGTERM', handler) 在 handler 里主动 close。有没有其他想法 @gxcsoccer

atian25 commented 7 years ago

SIGTERM is not supported on Windows, it can be listened on.

https://nodejs.org/api/process.html#process_signal_events

shaoshuai0102 commented 7 years ago

虽然说windows不支持,但确实没有做额外处理的情况下,进程还是会被成功杀的。。

atian25 commented 7 years ago

这句话没看懂,不支持,然后又 it can be listened on

shaoshuai0102 commented 7 years ago

https://github.com/eggjs/egg-cluster/blob/master/lib/app_worker.js#L65 app worker 做了处理 捕获 sigterm 后主动退出了。这样就解决了刚才的疑问,为什么 win 不支持还可以退出,因为是主动退出的。

那么我在这里试着主动 close follwer 连接试试

shaoshuai0102 commented 7 years ago

这篇值得一读 http://joseoncode.com/2014/07/21/graceful-shutdown-in-node-dot-js/

jtyjty99999 commented 7 years ago

赞,我来翻译下 @shaoshuai0102

jtyjty99999 commented 7 years ago

Nodejs中的优雅退出

标签(空格分隔): 翻译


UNIX信号 所说

信号是一种用于Unix,类Unix和其他posix兼容的操作系统上的一种有限的进程间通信方式。一个信号是一个异步的通知,他会发送到一个进程,或进程内的一个特定的线程来通知某个事件发生了。

信号有很多种,本文会集中讨论两个信号:

不管你在什么环境部署你的 nodejs 应用,你都需要让你的应用响应这两个信号:

Upstart: 当停止一个服务时,默认情况会发送 SIGTERM 信号,等待5秒后如果进程还在运行,会发送 SIGKILL 信号. supervisord: 当停止一个服务时,默认情况会发送 SIGTERM 信号,等待10秒后如果进程还在运行,会发送 SIGKILL 信号. runit: 当停止一个服务时,默认情况会发送 SIGTERM 信号,等待10秒后如果进程还在运行,会发送 SIGKILL 信号. Heroku dynos shutdown: 在参考链接中也描述了,当停止一个服务时,默认情况会发送 SIGTERM 信号,等待10秒后如果进程还在运行,会发送 SIGKILL 信号. Docker: 当你在docker容器中运行node应用时,如果你运行 docker stop 命令,容器中的主进程会收到 SIGTERM 信号, 一段时间之后(默认10秒), 收到SIGKILL 信号.

下面我们尝试一个简单的nodejs应用

var http = require('http');

var server = http.createServer(function (req, res) {
  setTimeout(function () { //simulate a long request
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
  }, 4000);
}).listen(9090, function (err) {
  console.log('listening http://localhost:9090/');
  console.log('pid is ' + process.pid)
});

你可以看到响应会延迟4秒。 nodejs文档是这么说的:

SIGTERM 和 SIGINT 在 非-Windows 平台上有默认的响应方式,会首先以 128 + 信号的退出代码来退出,之后重置命令行. 如果某个信号在程序中被监听,则默认的处理行为会被覆盖 (node 不会退出).

这里我也不清楚默认行为是什么, 我在处理请求的过程中发送一个 SIGTERM 信号,则这个请求会挂掉,如下图所示:

» curl http://localhost:9090 &
» kill 23703
[2] 23832
curl: (52) Empty reply from server

幸运的是,http 模块带有一个 close 方法 ,可以在处理完所有请求后停止响应新的连接,并触发回调函数。这个方法来自 NET 模块, 因此对于所有类型的tcp连接你都可以很方便的使用这个方法。

现在我修改一下例子

var http = require('http');

var server = http.createServer(function (req, res) {
  setTimeout(function () { //simulate a long request
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World\n');
  }, 4000);
}).listen(9090, function (err) {
  console.log('listening http://localhost:9090/');
  console.log('pid is ' + process.pid);
});

process.on('SIGTERM', function () {
  server.close(function () {
    process.exit(0);
  });
});

之后键入下面的命令:

» curl http://localhost:9090 &
» kill 23703
Hello World
[1]  + 24730 done       curl http://localhost:9090

你会注意到,进程在处理完最后一个请求后才推出。更有意思的是,收到 SIGTERM 信号之后,进程不会在继续处理请求了:

» curl http://localhost:9090 &
[1] 25072

» kill 25070

» curl http://localhost:9090 &
[2] 25097

curl: (7) Failed connect to localhost:9090; Connection refused
[2]  + 25097 exit 7     curl http://localhost:9090

» Hello World
[1]  + 25072 done       curl http://localhost:9090

某些博客和stackoverflow上的例子中,当server.close比预期花费更长时间时,会使用一个timeout来响应 SIGTERM 信号。上面提到的这些表明这个做法是没有必要的,每个进程管理器都会在SIGTERM 信号响应太长时间后发送SIGKILL信号。

ghost commented 7 years ago

我在win7环境下,vscode编辑器,没有用egg-init 自己手动搭建的环境。 请问最后解决方案是什么?

atian25 commented 7 years ago

@shaoshuai0102

shaoshuai0102 commented 7 years ago

还在解,优先级不是很高,不会影响实际线上环境运行。再等等哈

shaoshuai0102 commented 7 years ago

尝试通过主动关闭的方式解决问题。nodejs 的官方文档中的描述是有点问题的,win 下 sigterm 的实际执行效果是直接杀死进程也不能做监听,所以使用 sigterm 无法进行优雅退出。

解决方案是在 windows 下使用 IPC(process.send) 机制来做这个处理,线下已经测试过没有问题。

使用上面的方案已经能够有机会在 win 下进行优雅退出,但在 win 同时发现了另外一个诡异问题,只要 client 端主动关闭 socket,在 server 端就会有一个 ECONNREST 错误抛出。

测试代码:https://github.com/shaoshuai0102/nodejs-signal-test/tree/init

不知道这个是不是我写的有问题

cc @dead-horse @fengmk2 @gxcsoccer

gxcsoccer commented 7 years ago

https://github.com/shaoshuai0102/nodejs-signal-test/blob/master/worker.js#L20

只 end,不 destroy 试试?

shaoshuai0102 commented 7 years ago

我把 mac 下的代码传到 windows 下执行了,搞错了 我的错。我知道怎么处理了 thanks @gxcsoccer

joyeecheung commented 7 years ago

Node.js 的文档里写了:

Note: Windows does not support sending signals, but Node.js offers some emulation with process.kill(), and ChildProcess.kill(). Sending signal 0 can be used to test for the existence of a process. Sending SIGINT, SIGTERM, and SIGKILL cause the unconditional termination of the target process.

Windows 暂时不能通过信号优雅退出,参见:https://github.com/libuv/libuv/issues/1133

joyeecheung commented 7 years ago

另外我也看不懂

SIGTERM is not supported on Windows, it can be listened on.

是什么个意思……手头有 windows 机器的同学监听 process.on('SIGTERM', ...) 有反应咩?大概文档需要改下措辞……

gxcsoccer commented 7 years ago

@shaoshuai0102 分享下?

shaoshuai0102 commented 7 years ago

https://github.com/eggjs/egg/issues/654

okoala commented 7 years ago

所以更新计划是?

shaoshuai0102 commented 7 years ago

有点忙 这几天搞定

a526672351 commented 7 years ago

这个问题还没解决吗?同样的错误 2017-04-01 12:27:35,130 ERROR 10604 nodejs.ECONNRESETError: read ECONNRESET at exports._errnoException (util.js:1022:11) at TCP.onread (net.js:569:26) code: 'ECONNRESET' errno: 'ECONNRESET' syscall: 'read' name: 'ECONNRESETError' pid: 10604

sasbsd commented 7 years ago

关注中

jeffincn commented 7 years ago

@shaoshuai0102 我的问题也是如此 windows 7环境, egg版本是1.1.0,启动了1个worker,app目录下没有任何资源或前端文件,但依旧有这个问题,关注中...

2017-04-17 11:20:15,951 INFO 64116 [master] Agent Worker started (1596ms) 2017-04-17 11:20:15,963 INFO 64116 [master] App Worker#1:58844 start, state: none, current workers: ["1"] 2017-04-17 11:20:17,968 WARN 64116 [master] App Worker#1:58844 started at 7011, remain 0 (2015ms) 2017-04-17 11:20:17,969 INFO 64116 [master] egg started on http://127.0.0.1:7011 (3618ms) 2017-04-17 11:20:48,754 WARN 57900 [agent:development] reload worker because D:\Workspace\playground\egg-packer\showcase\config\config.default.js change [agent:development] reload worker because D:\Workspace\playground\egg-packer\showcase\config\config.default.js change 2017-04-17 11:20:48,796 INFO 64116 [master] reload workers... 2017-04-17 11:20:49,003 INFO 64116 [master] App Worker#2:64160 start, state: none, current workers: ["1","2"] 2017-04-17 11:20:53,139 WARN 64116 [master] App Worker#2:64160 started at 7011, remain 0 (37186ms) [Mon Apr 17 2017 11:20:53 GMT+0800 (中国标准时间)] [cfork:master:64116] worker:58844 disconnect (suicide: true, state: disco nnected, isDead: false) [Mon Apr 17 2017 11:20:53 GMT+0800 (中国标准时间)] [cfork:master:64116] don't fork new work (refork: false) 2017-04-17 11:20:53,144 INFO 64116 [master] App Worker#1:58844 disconnect, suicide: true, state: disconnected, current workers: ["2"] 2017-04-17 11:20:53,151 ERROR 57900 nodejs.unhandledExceptionError: read ECONNRESET at exports._errnoException (util.js:953:11) at TCP.onread (net.js:563:26) code: 'ECONNRESET' errno: 'ECONNRESET' [Mon Apr 17 2017 11:20:53 GMT+0800 (中国标准时间)] [cfork:master:64116] worker:58844 exit (code: null, suicide: true, state: dead,syscall: 'read' isDead: true, isExpected: true) name: 'unhandledExceptionError' pid: 57900 hostname: USER-PL8SMQJLQL

2017-04-17 11:20:53,155 ERROR 57900 nodejs.ECONNRESETError: read ECONNRESET at exports._errnoException (util.js:953:11) at TCP.onread (net.js:563:26) code: 'ECONNRESET' errno: 'ECONNRESET' syscall: 'read' name: 'ECONNRESETError' pid: 57900 hostname: USER-PL8SMQJLQL

shaoshuai0102 commented 7 years ago
fengsizi commented 7 years ago

同错: 2017-04-24 10:32:38,764 ERROR 5488 nodejs.ECONNRESETError: read ECONNRESET

at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)

code: 'ECONNRESET' errno: 'ECONNRESET' syscall: 'read' name: 'ECONNRESETError' pid: 5488 hostname: PC-20170326ZBVG

wyqhywjj601 commented 7 years ago

2017-04-24 11:12:24,764 ERROR 5488 nodejs.ECONNRESETError: read ECONNRESET

at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)

code: 'ECONNRESET' errno: 'ECONNRESET' syscall: 'read' name: 'ECONNRESETError' pid: 5488 hostname: PC-20170326ZBVG

Abby7826 commented 7 years ago

我也遇到了: 2017-04-24 11:02:30,764 ERROR 5488 nodejs.ECONNRESETError: read ECONNRESET

at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)

code: 'ECONNRESET' errno: 'ECONNRESET' syscall: 'read' name: 'ECONNRESETError' pid: 5490 hostname: PC-201703288ZBVG

QuinHan commented 7 years ago

我也遇到了: 2017-04-24 11:10:20,764 ERROR 5488 nodejs.ECONNRESETError: read ECONNRESET

at exports._errnoException (util.js:1018:11)
at TCP.onread (net.js:572:26)

code: 'ECONNRESET' errno: 'ECONNRESET' syscall: 'read' name: 'ECONNRESETError' pid: 5471 hostname: PC-20170324ZBVG

fengsizi commented 7 years ago
fengmk2 commented 7 years ago

已经在看了,请大家再耐心等待。

1166544 commented 7 years ago

26 days ago@shaoshuai0102说这几天搞定。。。

shaoshuai0102 commented 7 years ago

抱歉这些天忙其他事情,上周刚有时间捡起来但没搞定,我们其他同学会投入进来一起看一下。

fengmk2 commented 7 years ago

已经在 https://github.com/node-modules/cluster-client/pull/25 修复,等待合并发布,然后大家重新安装 node_modules 即可。

fengmk2 commented 7 years ago

image

@shaoshuai0102 可以改一下 windows send message 来退出了。