cjuexuan / mynote

237 stars 34 forks source link

es的gc监控带来的启发 #40

Open cjuexuan opened 7 years ago

cjuexuan commented 7 years ago

es的gc监控带来的启发

最近看到es中一段gc log


[2017-09-12 00:01:18,346][INFO ][monitor.jvm              ] [sh-bs-3-a11-es-17-101-master] [gc][young][460019][1433] duration [841ms], collections [1]/[1.8s], total [841ms]/[1.4m], memory [18.1gb]-
>[18.6gb]/[23.6gb], all_pools {[young] [235.9mb]->[56.8mb]/[3.2gb]}{[survivor] [409.5mb]->[409.5mb]/[409.5mb]}{[old] [17.5gb]->[18.1gb]/[20gb]}

[2017-09-06 15:27:16,462][WARN ][monitor.jvm              ] [sh-bs-3-b14-es-17-109-master] [gc][old][1122377][6645] duration [44.3s], collections [1]/[56.8s], total [44.3s]/[1.5h], memory [16.9gb]-
>[14.5gb]/[23.2gb], all_pools {[young] [2.2gb]->[60.4mb]/[6.4gb]}{[survivor] [60.5mb]->[0b]/[819.1mb]}{[old] [14.6gb]->[14.4gb]/[16gb]}

感觉es对gc的监控做的还是不错的,我就很好奇是怎么实现的,因为前段时间做gc监控的时候,我记得从MXBeans并没有这么详细的信息,我们当时的做法是

// java.lang.management.ManagementFactory
private val mxBeans = ManagementFactory.getGarbageCollectorMXBeans
private val minorBean = mxBeans.get(0)
private val majorBean = mxBeans.get(1)
val majorGcCount =   majorBean.getCollectionCount
val majorGcTime = majorBean.getCollectionTime
val minorGcCount = minorBean.getCollectionCount
val minorGcTime = minorBean.getCollectionTime

所以就决定跟踪下es的实现细节,其中主要实现在org.elasticsearch.monitor.jvm.JvmMonitorService.JvmMonitormonitorLongGc这个方法中

初始化的这个service的时候会设置interval,默认为1s一次,也就是一秒一次的调用MXBeans中的gcMonitor的方法,并且在这个类中保存lastJvmStats,每次检测是判断本次的gcCount和上一个状态的gcCount是否相等,如果不等,则意味着发生了gc,则获得gc时长,然后按照设置的阈值,选择是否打印日志

这个实现思路可以用在我们以后对gc作为事件的采集思路,因为gc这种如果作为周期性report的话,其实效果一般,理想状态下是其他指标作为线段,而gc事件作为线段的纵切线 类似下图,将gc做成虚线的纵切线,这样参照比较有价值,假设线段显示的指标是响应时间,那么在响应时间明显增加的时候我们可以看那个时间点是否发生gc之类的

grafana demo

相关代码:

 //一秒一次检测从MxBeans中获得的gc次数是否发生变化,如果发生变化就判断阈值
        private synchronized void monitorLongGc() {
            seq++;
            JvmStats currentJvmStats = jvmStats();

            for (int i = 0; i < currentJvmStats.getGc().getCollectors().length; i++) {
                GarbageCollector gc = currentJvmStats.getGc().getCollectors()[i];
                GarbageCollector prevGc = lastJvmStats.gc.collectors[i];

                // no collection has happened
                long collections = gc.collectionCount - prevGc.collectionCount;
                if (collections == 0) {
                    continue;
                }
                long collectionTime = gc.collectionTime - prevGc.collectionTime;
                if (collectionTime == 0) {
                    continue;
                }

                GcThreshold gcThreshold = gcThresholds.get(gc.getName());
                if (gcThreshold == null) {
                    gcThreshold = gcThresholds.get("default");
                }

                long avgCollectionTime = collectionTime / collections;

                if (avgCollectionTime > gcThreshold.warnThreshold) {
                    logger.warn("[gc][{}][{}][{}] duration [{}], collections [{}]/[{}], total [{}]/[{}], memory [{}]->[{}]/[{}], all_pools {}",
                            gc.getName(), seq, gc.getCollectionCount(), TimeValue.timeValueMillis(collectionTime), collections, TimeValue.timeValueMillis(currentJvmStats.getTimestamp() - lastJvmStats.getTimestamp()), TimeValue.timeValueMillis(collectionTime), gc.getCollectionTime(), lastJvmStats.getMem().getHeapUsed(), currentJvmStats.getMem().getHeapUsed(), JvmInfo.jvmInfo().getMem().getHeapMax(), buildPools(lastJvmStats, currentJvmStats));
                } else if (avgCollectionTime > gcThreshold.infoThreshold) {
                    logger.info("[gc][{}][{}][{}] duration [{}], collections [{}]/[{}], total [{}]/[{}], memory [{}]->[{}]/[{}], all_pools {}",
                            gc.getName(), seq, gc.getCollectionCount(), TimeValue.timeValueMillis(collectionTime), collections, TimeValue.timeValueMillis(currentJvmStats.getTimestamp() - lastJvmStats.getTimestamp()), TimeValue.timeValueMillis(collectionTime), gc.getCollectionTime(), lastJvmStats.getMem().getHeapUsed(), currentJvmStats.getMem().getHeapUsed(), JvmInfo.jvmInfo().getMem().getHeapMax(), buildPools(lastJvmStats, currentJvmStats));
                } else if (avgCollectionTime > gcThreshold.debugThreshold && logger.isDebugEnabled()) {
                    logger.debug("[gc][{}][{}][{}] duration [{}], collections [{}]/[{}], total [{}]/[{}], memory [{}]->[{}]/[{}], all_pools {}",
                            gc.getName(), seq, gc.getCollectionCount(), TimeValue.timeValueMillis(collectionTime), collections, TimeValue.timeValueMillis(currentJvmStats.getTimestamp() - lastJvmStats.getTimestamp()), TimeValue.timeValueMillis(collectionTime), gc.getCollectionTime(), lastJvmStats.getMem().getHeapUsed(), currentJvmStats.getMem().getHeapUsed(), JvmInfo.jvmInfo().getMem().getHeapMax(), buildPools(lastJvmStats, currentJvmStats));
                }
            }
            lastJvmStats = currentJvmStats;
        }

完整代码:

目前我看的版本是es-2.3

es github

另外在最新版中,这部分代码被放到了

JvmGcMonitorService

mumutu66 commented 7 years ago

💯