SkyAPM / SkyAPM-nodejs

The NodeJS server side agent for Apache SkyWalking
Apache License 2.0
128 stars 86 forks source link

高qps情况下内存暴增 #107

Closed iliuyt closed 4 years ago

iliuyt commented 4 years ago

内存暴增

本地通过docker 启用skywalking 7,nodejs使用skyapm-nodejs调用skywalking, qps越高持续时间越长内存增的越快,平均约qps:3000 每十秒增加500M,

image

demo压测代码

require("skyapm-nodejs").start({
  serviceName: "local-demo",
  directServers: "127.0.0.1:11800",
});

const http = require("http");
const server = http.createServer((req, res) => {
  res.end("success");
});
server.listen(8000, function() {
  console.log("http://127.0.0.1:8000/");
});
kezhenxu94 commented 4 years ago

@iliuyt 我这边使用你的 demo 代码无法复现, 加大 qps 也无法复现

image

iliuyt commented 4 years ago

@kezhenxu94

以下是我的环境

es集群
version: "2.0"
services:
   elasticsearch-central:
      image: elasticsearch:6.5.0
      container_name: es1
      volumes:
         - ./node/es1/data:/usr/share/elasticsearch/data
         - ./node/es1/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
      environment:
         - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
         - ES_CLUSTERNAME=elasticsearch
      command: elasticsearch
      ports:
         - "9200:9200"
         - "9300:9300"
      networks:
         default:
            ipv4_address: 172.21.0.100
   elasticsearch-data:
      image: elasticsearch:6.5.0
      container_name: es2
      volumes:
         - ./node/es2/data:/usr/share/elasticsearch/data
         - ./node/es2/elasticsearch.yml:/usr/share/elasticsearch/config/elasticsearch.yml
      environment:
         - bootstrap.memory_lock=true
         - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
         - ES_CLUSTERNAME=elasticsearch
      command: elasticsearch
      ports:
         - "9201:9200"
         - "9301:9300"
      links:
         - elasticsearch-central:elasticsearch
      networks:
         default:
            ipv4_address: 172.21.0.101
   elasticsearch-head:
      image: mobz/elasticsearch-head:5
      container_name: head
      volumes:
         - ./head/Gruntfile.js:/usr/src/app/Gruntfile.js
         - ./head/_site/app.js:/usr/src/app/_site/app.js
      ports:
         - "9100:9100"
      links:
         - elasticsearch-central:elasticsearch
      networks:
         default:
            ipv4_address: 172.21.0.102

networks:
   default:
      external:
         name: common

skywalking

version: '3.3'
services:
  oap:
    image: apache/skywalking-oap-server:7.0.0-es6
    container_name: oap
    restart: always
    ports:
      - 11800:11800
      - 12800:12800
    environment:
      - SW_STORAGE=elasticsearch #Es的存储
      - SW_STORAGE_ES_CLUSTER_NODES=172.21.0.100:9200
      - TZ=Asia/Shanghai
    networks:
      default:
        ipv4_address: 172.21.0.201
  ui:
    image: apache/skywalking-ui:7.0.0
    container_name: ui
    depends_on:
      - oap
    links:
      - oap
    restart: always
    ports:
      - 18080:8080 #默认8080端口,这里设置18080映射到宿主机,可修改
    environment:
      collector.ribbon.listOfServers: 172.21.0.201:12800
    networks:
      default:
        ipv4_address: 172.21.0.202

networks:
  default:
    external:
      name: common
其他信息
iliuyt commented 4 years ago

另外阅读源码中有一些诱惑,能否帮忙解答 源码中 lib/cache/index.js中 1、put方法里的判断是什么意思? 2、scheduleConsumeData方法中为什么调用了this._timeout.unref();?

具体看我下面代码的注释

TraceSegmentCachePool.prototype.put = function(traceSegment) {
    this._bucket.push(traceSegment);
    // this._bucketSize是一个bool类型,这里和-1对比永远都是true
    // 第二个判断也没理解什么意思,为什么要用长度和bool对比
    if (this._bucketSize !== -1 && this._bucket.length >= this._bucketSize) {
        this.consumeData();
    } else if (!this._timeout) {
        this.scheduleConsumeData();
    }
};
TraceSegmentCachePool.prototype.scheduleConsumeData = function() {
    let self = this;
    this._timeout = setTimeout(function() {
        self.consumeData();
    }, this._flushInterval);
    // 这里调用unref就取消了发送请求,为什么要调用unref呢
    this._timeout.unref();
};
kezhenxu94 commented 4 years ago

@iliuyt 多谢提供环境详情, 我晚上试试看

  1. 那个判断是个 bug, 已经修复了

https://github.com/SkyAPM/SkyAPM-nodejs/blob/3129f7fd529b5ee8deb8b2b481ea1d90c2e6636c/modules/nodejs-agent/lib/cache/index.js#L26

https://github.com/SkyAPM/SkyAPM-nodejs/blob/3129f7fd529b5ee8deb8b2b481ea1d90c2e6636c/modules/nodejs-agent/lib/cache/index.js#L34

  1. 我的理解是 unref 并不会取消发送, 只是让这个 timer 不会阻止 Node 进程退出, 类似 Java 中设置线程为 Daemon
iliuyt commented 4 years ago

@kezhenxu94 感谢解答,我有将skyapm-nodejs里的grpc更改为@grpc/grpc-js版本,压测发现内存不再激增,但我放到服务器上后,运行一周后,内存依旧会增高到1G左右,在不添加skyapm-nodejs包后,我的内存基本维持在100M。

wu-sheng commented 4 years ago

建议检查后端是否有足够的处理能力。同时,需要检查

这是java agent在面临上述场景时的典型保护设计。

iliuyt commented 4 years ago

我看到bug最近已修复,https://github.com/SkyAPM/SkyAPM-nodejs/pull/111