alibaba / nacos

an easy-to-use dynamic service discovery, configuration and service management platform for building cloud native applications.
https://nacos.io
Apache License 2.0
30.32k stars 12.85k forks source link

NACOS占用CPU资源过高 #4268

Closed dangle253280 closed 3 years ago

dangle253280 commented 3 years ago

Type: bug report or feature request

Describe what happened (or what feature you want)

Nacos occupies too much CPU resources, the related investigations are as follows: image

`[admin@iz2ze1z94qja2p8xnhwd5kz ~]$ jstack 1848 | grep 73a -C 300 "I/O dispatcher 1" #123 prio=5 os_prio=0 tid=0x00007fe6ac792800 nid=0x7ed runnable [0x00007fe690baf000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

"nacos.publisher-com.alibaba.nacos.core.cluster.MembersChangeEvent" #122 daemon prio=5 os_prio=0 tid=0x00007fe6e4590800 nid=0x7ec waiting on condition [0x00007fe69150d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"pool-2-thread-1" #120 prio=5 os_prio=0 tid=0x00007fe6e4c34000 nid=0x7eb runnable [0x00007fe69160e000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)

"HikariPool-1 housekeeper" #39 daemon prio=5 os_prio=0 tid=0x00007fe6e512a000 nid=0x78f waiting on condition [0x00007fe691d11000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"derby.rawStoreDaemon" #38 daemon prio=5 os_prio=0 tid=0x00007fe6a8059000 nid=0x78d in Object.wait() [0x00007fe691c10000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at org.apache.derby.impl.services.daemon.BasicDaemon.rest(Unknown Source)

"Timer-0" #36 daemon prio=5 os_prio=0 tid=0x00007fe6e50d3000 nid=0x78b in Object.wait() [0x00007fe691e12000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.util.TimerThread.mainLoop(Timer.java:552)

"SimplePauseDetectorThread_0" #35 daemon prio=9 os_prio=0 tid=0x00007fe6a4023000 nid=0x78a waiting on condition [0x00007fe691f13000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at java.lang.Thread.sleep(Thread.java:340) at java.util.concurrent.TimeUnit.sleep(TimeUnit.java:386) at org.LatencyUtils.TimeServices.sleepNanos(TimeServices.java:62) at org.LatencyUtils.SimplePauseDetector$SimplePauseDetectorThread.run(SimplePauseDetector.java:116)

"Thread-14" #34 daemon prio=9 os_prio=0 tid=0x00007fe6a401d000 nid=0x789 waiting on condition [0x00007fe6930e8000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"mysql-cj-abandoned-connection-cleanup" #33 daemon prio=5 os_prio=0 tid=0x00007fe6e491c800 nid=0x787 in Object.wait() [0x00007fe6b94c9000] java.lang.Thread.State: TIMED_WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)

"logback-8" #27 daemon prio=5 os_prio=0 tid=0x00007fe6e450b000 nid=0x771 waiting on condition [0x00007fe6939fa000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"logback-7" #26 daemon prio=5 os_prio=0 tid=0x00007fe6e4509800 nid=0x770 waiting on condition [0x00007fe693afb000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"AsyncAppender-Worker-async-naming-event" #25 daemon prio=5 os_prio=0 tid=0x00007fe6e4253000 nid=0x76f waiting on condition [0x00007fe693bfc000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"logback-6" #24 daemon prio=5 os_prio=0 tid=0x00007fe6e4251000 nid=0x76e waiting on condition [0x00007fe693cfd000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"AsyncAppender-Worker-async-naming-distro" #23 daemon prio=5 os_prio=0 tid=0x00007fe6e424f000 nid=0x76d waiting on condition [0x00007fe693dfe000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"logback-5" #22 daemon prio=5 os_prio=0 tid=0x00007fe6e449f000 nid=0x76c waiting on condition [0x00007fe6b81a8000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"AsyncAppender-Worker-async-naming-raft" #21 daemon prio=5 os_prio=0 tid=0x00007fe6e449d000 nid=0x76b waiting on condition [0x00007fe6b82a9000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"logback-4" #20 daemon prio=5 os_prio=0 tid=0x00007fe6e449b000 nid=0x76a waiting on condition [0x00007fe6b83aa000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"AsyncAppender-Worker-async-naming-server" #19 daemon prio=5 os_prio=0 tid=0x00007fe6e4499800 nid=0x769 waiting on condition [0x00007fe6b84ab000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"logback-3" #18 daemon prio=5 os_prio=0 tid=0x00007fe6e4693000 nid=0x768 waiting on condition [0x00007fe6b85ac000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"logback-2" #17 daemon prio=5 os_prio=0 tid=0x00007fe6e46cb800 nid=0x767 waiting on condition [0x00007fe6b86ad000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"logback-1" #16 daemon prio=5 os_prio=0 tid=0x00007fe6e4486800 nid=0x766 waiting on condition [0x00007fe6b87ae000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"com.alibaba.nacos.core.common.-1" #15 daemon prio=5 os_prio=0 tid=0x00007fe6e469e800 nid=0x765 waiting on condition [0x00007fe6b8caf000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"com.alibaba.nacos.core.common.0" #14 daemon prio=5 os_prio=0 tid=0x00007fe6e469d800 nid=0x764 waiting on condition [0x00007fe6b8db0000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"nacos.publisher-com.alibaba.nacos.common.notify.SlowEvent" #10 daemon prio=5 os_prio=0 tid=0x00007fe6e4671000 nid=0x763 waiting on condition [0x00007fe6b93c8000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method)

"Service Thread" #7 daemon prio=9 os_prio=0 tid=0x00007fe6e4143000 nid=0x742 runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"C1 CompilerThread1" #6 daemon prio=9 os_prio=0 tid=0x00007fe6e413e000 nid=0x741 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"C2 CompilerThread0" #5 daemon prio=9 os_prio=0 tid=0x00007fe6e413b000 nid=0x740 waiting on condition [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"Signal Dispatcher" #4 daemon prio=9 os_prio=0 tid=0x00007fe6e4139800 nid=0x73f runnable [0x0000000000000000] java.lang.Thread.State: RUNNABLE

"Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007fe6e4101800 nid=0x73e in Object.wait() [0x00007fe6d44f3000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)

"Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007fe6e40ff800 nid=0x73d in Object.wait() [0x00007fe6d45f4000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) at java.lang.Object.wait(Object.java:502) at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:157)

"VM Thread" os_prio=0 tid=0x00007fe6e40fa000 nid=0x73c runnable

"GC task thread#0 (ParallelGC)" os_prio=0 tid=0x00007fe6e401e800 nid=0x73a runnable

"GC task thread#1 (ParallelGC)" os_prio=0 tid=0x00007fe6e4020000 nid=0x73b runnable

"VM Periodic Task Thread" os_prio=0 tid=0x00007fe6e4146000 nid=0x743 waiting on condition

JNI global references: 2323`

Describe what you expected to happen

Tell us your environment

Aliyun image code: `

org.apache.dubbo
            <artifactId>dubbo</artifactId>
            <version>2.7.7</version>
            <exclusions>
                <exclusion>
                    <!-- 排除传递spring依赖 -->
                    <artifactId>spring</artifactId>
                    <groupId>org.springframework</groupId>
                </exclusion>
            </exclusions>
        </dependency>
        <dependency>
            <groupId>com.alibaba.nacos</groupId>
            <artifactId>nacos-client</artifactId>
            <version>1.3.2</version>
        </dependency>`

Anything else we need to know?

dangle253280 commented 3 years ago

May be a problem here, but I can’t find a solution

` void openEventHandler() { try {

        // This variable is defined to resolve the problem which message overstock in the queue.
        int waitTimes = 60;
        // To ensure that messages are not lost, enable EventHandler when
        // waiting for the first Subscriber to register
        for (; ; ) {
            if (shutdown || hasSubscriber() || waitTimes <= 0) {
                break;
            }
            ThreadUtils.sleep(1000L);
            waitTimes--;
        }

        for (; ; ) {
            if (shutdown) {
                break;
            }
            final Event event = queue.take();
            receiveEvent(event);
            updater.compareAndSet(this, lastEventSequence, Math.max(lastEventSequence, event.sequence()));
        }
    } catch (Throwable ex) {
        LOGGER.error("Event listener exception : {}", ex);
    }
}

`

horizonzy commented 3 years ago

Hi, I view the log. 73a,73b,73c are gc thread, maybe it's gc load cpu. Can you use jstat -gcutil pid 1000 to check it.

dangle253280 commented 3 years ago

thx,The GC information: image

How to solve the problem?

horizonzy commented 3 years ago

GC problem obviously, allocate more memory for jvm heap.

dangle253280 commented 3 years ago

thx!

dangle253280 commented 3 years ago

我可以用 UseG1GC 来代替 Parallel GC吗?

horizonzy commented 3 years ago

就用Parallel GC就行了。增加你的堆内存即可。

KomachiSion commented 3 years ago

GC器和JVM参数都可以自行调整。

kimmking commented 2 years ago

2个ParallelGC线程出现,意味着正在做GC。(ParallelGC不管是YGC还是FGC都是STW的,其他线程要么在native状态,要么被阻塞,都是不怎么用CPU了。) 这个时候CPU的使用跟业务线程,包括nacos的线程没有关系了。

同时从你的GC信息来看,Old区一直是满的。FGC回收不掉,每秒都有2-3次FGC,每次大概300ms,所以所有的CPU都被用来做GC了。。。。这是快要OOM了。找到内存泄露的地方去优化。