Closed XRJ1230663 closed 6 months ago
我看了一下,早上卡的不行,fvm 2G服务器 25% 内存占用,问了一圈说周末发了一版修复,早上升级后好了,内存占用大约在3%左右,但是,现在内存占用已经达到8%左右,时间大约5小时左右。 版本是由 4.4.1 >升级> 4.4.2
我看了一下,早上卡的不行,fvm 2G服务器 25% 内存占用,问了一圈说周末发了一版修复,早上升级后好了,内存占用大约在3%左右,但是,现在内存占用已经达到8%左右,时间大约5小时左右。 版本是由 4.4.1 >升级> 4.4.2
昨晚升级了4.4.2,但依然很耗资源,2核4G服务器单独部署的基本沾满
@XRJ1230663 能否具体看下是那个服务的占用
@QYG2297248353 目前 fvm 还是在持续上升吗
@QYG2297248353 目前 fvm 还是在持续上升吗
后续就没有了,保持在 8-9%之间,到现在一天了 还是有点高,比SpringBoot程序都高,建议优化,但是比以前好,不至于半夜告警访问缓慢
我看了一下,早上卡的不行,fvm 2G服务器 25% 内存占用,问了一圈说周末发了一版修复,早上升级后好了,内存占用大约在3%左右,但是,现在内存占用已经达到8%左右,时间大约5小时左右。 版本是由 4.4.1 >升级> 4.4.2
昨晚升级了4.4.2,但依然很耗资源,2核4G服务器单独部署的基本沾满
我服务器2核2G,跑个雷池,和前端就不错了,大头还是雷池,内存一高就访问缓慢,没办法只能迁移其他大一点服务到其他服务器
@XRJ1230663 能否具体看下是那个服务的占用
4.4.2已经不使用了,昨天看了一下,这个luigi内存占了超过50%
对,这个问题从4.3.3开始到现在最新4.4.2就一直存在,微信群里也反映过,但是最后没有下文。重启后luigi 大概3-5分钟就会开始持续占用CPU100%以上,并且一直占用,此时QPS显示开始异常,5-8小时后,detector容器就会报unhealty(系统进入类bypass模式)不记录日志,不检测流量,只有tengine做单纯的转发
我看了一下,早上卡的不行,fvm 2G服务器 25% 内存占用,问了一圈说周末发了一版修复,早上升级后好了,内存占用大约在3%左右,但是,现在内存占用已经达到8%左右,时间大约5小时左右。 版本是由 4.4.1 >升级> 4.4.2
昨晚升级了4.4.2,但依然很耗资源,2核4G服务器单独部署的基本沾满
我服务器2核2G,跑个雷池,和前端就不错了,大头还是雷池,内存一高就访问缓慢,没办法只能迁移其他大一点服务到其他服务器
不要说2G了,我64G放着跑几天,都能全部沾满掉,而且服务还不正常
目前看来是 luigi 服务的 CPU/内存 异常,我们会抓紧定位问题,并在后续版本修复
情况相似,2c2g云服务器单独部署雷池waf,升级4.4.1版本后出现负载过高的情况,每天收到负载告警,重启后恢复,升级4.4.2版本后到目前为止正常
5.0.0 版本已发布,麻烦各位更新到最新版后再观察一下
5.0.0 版本已发布,麻烦各位更新到最新版后再观察一下
14日11点更新,到现在12小时很稳定,没有内存占用爬升的情况 并且占用优化很到位,2核2G的设备已经没有压力了
问题已解决,issue关闭
升级到5.3.3 这个问题又来了。。。。。。
yrluke @.***> 于2024年3月22日周五 17:41写道:
问题已解决,issue关闭
— Reply to this email directly, view it on GitHub https://github.com/chaitin/SafeLine/issues/739#issuecomment-2014712091, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEA23MZNEUJQQL5CH74ZFFTYZP4C5AVCNFSM6AAAAABEPW7MKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUG4YTEMBZGE . You are receiving this because you commented.Message ID: @.***>
-- 来自SKY,拥有无限自由!
升级5.3.3后,在QPS达到100左右,QPS不再更新,持续显示为0,防护日志也不进行如何纪录,退回5.3.2上诉问题,消失
不早点说,唉,我还是等了好几天才升级。2024/04/18早上12点左右升级的,fvm今天就达到8.1的占用了
@yrluke 快来看看,虽然这次没有导致服务器奔溃,占用比例还是有点高,snserver也是比平常略高些
@QYG2297248353 目前 fvm 还是在持续上升吗
后续就没有了,保持在 8-9%之间,到现在一天了 还是有点高,比SpringBoot程序都高,建议优化,但是比以前好,不至于半夜告警访问缓慢
@yrluke 已经达到或者说略微超过这个问题的水平了
@QYG2297248353 luigi cpu 内存什么表现
luigi cpu 0
小小半天功夫 10%了,着实是在稳步爬升 @yrluke
看下 fvm 的日志?
docker logs safeline-fvm
2024/04/18 18:51:00 [Fx] PROVIDE *runner.Runner <= git.in.chaitin.net/dev/go/module.v2/runner.NewRunner()
2024/04/18 18:51:00 [Fx] SUPPLY *config.ManagerConfig
2024/04/18 18:51:00 [Fx] SUPPLY *gorm.DB
2024/04/18 18:51:00 [Fx] PROVIDE *fvm.FVM <= git.in.chaitin.net/patronus/fvm/manager/module/fvm.New()
2024/04/18 18:51:00 [Fx] PROVIDE []*node.Client <= git.in.chaitin.net/patronus/fvm/manager/module/node.NewClient()
2024/04/18 18:51:00 [Fx] SUPPLY *grpc.Server
2024/04/18 18:51:00 [Fx] PROVIDE *manager.FVMServer <= git.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.NewServer()
2024/04/18 18:51:00 [Fx] PROVIDE *node.PullServer <= git.in.chaitin.net/patronus/fvm/manager/module/rpc/node.NewServer()
2024/04/18 18:51:00 [Fx] SUPPLY *log.Logger
2024/04/18 18:51:00 [Fx] PROVIDE fx.Lifecycle <= go.uber.org/fx.New.func1()
2024/04/18 18:51:00 [Fx] PROVIDE fx.Shutdowner <= go.uber.org/fx.(*App).shutdowner-fm()
2024/04/18 18:51:00 [Fx] PROVIDE fx.DotGraph <= go.uber.org/fx.(*App).dotGraph-fm()
2024/04/18 18:51:00 [Fx] INVOKE git.in.chaitin.net/dev/go/module.v2/runner.glob..func1()
2024/04/18 18:51:00 [Fx] INVOKE git.in.chaitin.net/patronus/fvm/manager/module/manager.Run()
2024/04/18 18:51:00 /work/module/manager/manager.go:19 SLOW SQL >= 200ms
[714.552ms] [rows:0] CREATE TABLE `fvm_version` (`latest` integer,`oldest` integer)
2024/04/18 18:51:01 /work/module/manager/manager.go:23 SLOW SQL >= 200ms
[246.533ms] [rows:0] CREATE TABLE `fvm_update` (`version` integer,`content` blob)
2024/04/18 18:51:02 /work/module/manager/manager.go:24 SLOW SQL >= 200ms
[1069.178ms] [rows:0] CREATE TABLE `fvm_re` (`id` integer,`table` text,`content` blob,PRIMARY KEY (`id`))
2024/04/18 18:51:02 /work/module/db/db.go:78 record not found
[0.214ms] [rows:0] SELECT * FROM `fvm_update` WHERE version = 0 ORDER BY `fvm_update`.`version` LIMIT 1
2024/04/18 18:51:02 [Fx] INVOKE git.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.Register()
2024/04/18 18:51:02 [Fx] INVOKE git.in.chaitin.net/patronus/fvm/manager/module/rpc/node.Register()
2024/04/18 18:51:02 [Fx] INVOKE git.in.chaitin.net/patronus/fvm/manager/module/rpc.Run()
2024/04/18 18:51:02 [Fx] START git.in.chaitin.net/dev/go/module.v2/runner.NewRunner()
2024/04/18 18:51:02 [Module] START git.in.chaitin.net/patronus/fvm/manager/module/rpc.Run()
2024/04/18 18:51:02 [Fx] RUNNING
2024/04/18 18:51:07 INFO refresh fsl because detector policy version is 0
2024/04/18 18:51:07 ERROR build fsl error err="database is locked\nload version from db\ngit.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.(*FVMServer).pushFsl.func1\n\t/work/module/rpc/fvm/fvm.go:118\ngorm.io/gorm.(*DB).Transaction\n\t/go/pkg/mod/gorm.io/gorm@v1.25.2-0.20230530020048-26663ab9bf55/finisher_api.go:647\ngit.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.(*FVMServer).pushFsl\n\t/work/module/rpc/fvm/fvm.go:111\ngit.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.(*FVMServer).check\n\t/work/module/rpc/fvm/fvm.go:63\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"
2024/04/18 18:51:07 /work/module/db/db.go:59 database is locked
[0.038ms] [rows:0] INSERT INTO `fvm_version` (`latest`,`oldest`) VALUES (0,0)
2024/04/18 18:51:08 INFO Push FSL success
2024/04/18 18:51:08 /work/module/db/db.go:129 database is locked
[0.052ms] [rows:0] DELETE FROM `fvm_update` WHERE version != 0
2024/04/18 18:51:08 [ERROR] fvm/fvm_grpc.pb.go:316 error:%v send to stream: failed to remove all diff: database is locked
2024/04/18 18:51:10 INFO Push FSL success
2024/04/19 01:00:03 INFO Push FSL success
2024/04/19 01:04:41 INFO Push FSL success
2024/04/19 12:18:59 INFO Push FSL success
2024/04/19 12:19:59 INFO Push FSL success
2024/04/19 12:27:32 INFO Push FSL success
2024/04/19 12:36:52 INFO Push FSL success
2024/04/19 12:37:15 INFO Push FSL success
2024/04/19 12:39:09 INFO Push FSL success
2024/04/19 12:45:00 INFO Push FSL success
2024/04/19 12:46:14 INFO Push FSL success
2024/04/19 15:29:07 ERROR get stat error err="Get response failed:\n git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n /work/module/fvm/fvm.go:351\n - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:50439->127.0.0.11:53: i/o timeout"
2024/04/19 15:29:46 ERROR get stat error err="Get response failed:\n git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n /work/module/fvm/fvm.go:351\n - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:44893->127.0.0.11:53: i/o timeout"
2024/04/19 15:30:15 ERROR get stat error err="Get response failed:\n git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n /work/module/fvm/fvm.go:351\n - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:48864->127.0.0.11:53: i/o timeout"
2024/04/19 15:30:42 ERROR get stat error err="Get response failed:\n git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n /work/module/fvm/fvm.go:351\n - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:45189->127.0.0.11:53: i/o timeout"
2024/04/19 15:31:12 ERROR get stat error err="Get response failed:\n git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n /work/module/fvm/fvm.go:351\n - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:57307->127.0.0.11:53: i/o timeout"
2024/04/19 15:31:53 ERROR get stat error err="Get response failed:\n git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n /work/module/fvm/fvm.go:351\n - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:45976->127.0.0.11:53: i/o timeout"
2024/04/19 15:32:18 ERROR get stat error err="Get response failed:\n git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n /work/module/fvm/fvm.go:351\n - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:49508->127.0.0.11:53: i/o timeout"
看起来容器内网络有点问题,你看看为啥会 Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:50439->127.0.0.11:53: i/o timeout
那这就是在为难我了,无从下手呀,safeline-detector日志都是:
[2024-04-19 15:32:17.329] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:32:17.329] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:32:17.329] [1] [ERROR] send weblog error: error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution
[2024-04-19 15:33:50.581] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:50.580] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:50.670] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:50.670] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:51.990] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:51.990] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.140] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.210] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.210] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.210] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.210] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.591] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:58.851] [1] [ERROR] send weblog error: connection error: Connection reset by peer (os error 104)
[2024-04-19 15:34:18.364] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:34:24.893] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:09.970] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:11.540] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.551] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.720] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.720] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.910] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.911] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
fvm 还在上升吗,现在是多少了
整了个 2c4g的环境 1000 qps 持续 30m 测了一下,没发现持续上涨的情况。下周版本给 fvm 带上一个 pprof,到时帮忙采集一下信息再看下可以吗
主要是上一个版本可没这问题,之前还修复过,唉,
我们会持续再观察一下这个情况
确实如此,我的5.3.3也是,内存问题比前几个版本更严重,在流量低峰的时候,luigi也能耗尽内存,服务器死机了
确实如此,我的5.3.3也是,内存问题比前几个版本更严重,在流量低峰的时候,luigi也能耗尽内存,服务器死机了
真的是,死机后ssh连接不上,跑到阿里云重启服务器才解决
@xbingW 降级又没有一键降级的命令 太折磨人了
在 luigi 内存偏高时,能否提供一下 luigi 的 pprof 数据分析一下
curl http://127.0.0.1:1086/debug/pprof/profile -o profile.pb.gz
curl http://127.0.0.1:1086/debug/pprof/heap -o heap.pb.gz
我刚重启完 现在4.2% 2c2g
另外可以考虑升级一下今天的版本 5.4.0,做了一些小优化
@xbingW zhe这玩意是什么鬼
重启 docker 再试试
容器都没了
默认路径变了吗,再次安装找不到雷池环境 还提示 /root 为默认路径
因为你启动失败了, 重启 docker 再在安装目录试试 docker compose up -d
默认路径变了吗,再次安装找不到雷池环境 还提示 /root 为默认路径
提示的路径是当前路径,你在 /data/safeline 目录升级
好了,启动了,脚本就不能自己找找路径先,万一哪天真忘了
后面我们优化下
后面我们优化下
5.4.0 版本修复了吗
一晚上的运行结果,还是偏高,略有改善
一晚上的运行结果,还是偏高,略有改善
Luigi那种cpu,内存,磁盘读写几秒内突然冲高占满的现象还有吗?我都不敢升了,现在已经停用了Luigi了
暂时没收到服务报异常 日志只有 luigi 升级启动后有报错日志
5.4.0 表面上CPU占用的问题的确下来了,LUIGI也不占用CPU了,但是问题没有解决,而且在更短的时间(重启5分钟)内直接进入不显示QPS(QPS持续为0),防护日志不记录,依然只能退回5.3.2
@xbingW
另外可以考虑升级一下今天的版本 5.4.0,做了一些小优化
升级了5.4.0,内存占用情况没有太明显改善,在启动容器平稳运行几分钟之后,突然开始快速吃掉内存,最后不敢试了,还是停用了Luigi
问题描述
4.4.2非常占用资源,cpu和内存随着时间程线性上升,4.3.2不会有问题
版本号
4.4.2
复现方法
4.4.2非常占用资源,cpu和内存随着时间程线性上升,4.3.2不会有问题
期望的结果
解决负载高的问题或者能支持指定版本安装