apache / rocketmq

Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
https://rocketmq.apache.org/
Apache License 2.0
21.28k stars 11.71k forks source link

[Bug] broker java heap space #8959

Open noone6931 opened 2 days ago

noone6931 commented 2 days ago

Before Creating the Bug Report

Runtime platform environment

centos8

RocketMQ version

rocketmq v4.3.2

JDK Version

oracle jdk1.8_121

Describe the Bug

Hello,

We are encountering an OutOfMemoryError (OOM) on one of our RocketMQ brokers. The issue occurred on November 15th at 17:47:18, and the error message indicates that the Java heap space was exhausted. However, based on the system metrics, we did not observe any anomalies leading up to the event, and the OOM error could be a symptom rather than the root cause.

The JVM heap memory was set to 6GB, and the machine had 16GB of total memory at the time of the issue. We are unsure what caused the broker to run out of memory, as the system appeared to be functioning normally before the OOM occurred.

Here is the broker.log at that moment: 2024-11-15 17:47:18 WARN PullMessageThread_8 - FULL_OFFSET:correction offset: TOOLS_CONSUMER, topic=zqyl-bank-ny1002, group=TOOLS_CONSUMER, requestOffset=0, newOffset=25598, suggestBrokerId=0 BrokerMinOffset=25559, clientIp=172.16.1.230 2024-11-15 17:47:18 ERROR BrokerControllerScheduledThread1 - persist consumerOffset error. java.lang.OutOfMemoryError: Java heap space 2024-11-15 17:47:18 INFO NettyEventExecutor - NETTY EVENT: remove channel [ClientChannelInfo [channel=[id: 0x194e828d, L:/172.16.1.230:10911 - R:/172.16.1.162:44598] 2024-11-15 17:47:19 ERROR NettyServerNIOWorker_3_2 - transfer many message by pagecache failed. java.util.concurrent.RejectedExecutionException: event executor terminated at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:805) at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:338) at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:328) at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:748) at io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1000) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:829) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) at io.netty.channel.DefaultChannelPipeline.write(DefaultChannelPipeline.java:1013) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110) at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) at org.apache.rocketmq.broker.processor.PullMessageProcessor.processRequest(PullMessageProcessor.java:839) at org.apache.rocketmq.broker.processor.PullMessageProcessor.access$2(PullMessageProcessor.java:539) at org.apache.rocketmq.broker.processor.PullMessageProcessor$2.run(PullMessageProcessor.java:884) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

The remoting.log:

2024-11-15 17:47:16 INFO NettyServerCodecThread_7 - NETTY SERVER PIPELINE: channelRegistered 172.16.2.196:16378 2024-11-15 17:47:16 INFO NettyServerCodecThread_3 - NETTY SERVER PIPELINE: channelActive, the channel[172.16.2.196:16378] 2024-11-15 17:47:16 INFO NettyServerCodecThread_3 - NETTY SERVER PIPELINE: channelInactive, the channel[172.16.2.196:16378] 2024-11-15 17:47:16 INFO NettyServerCodecThread_3 - NETTY SERVER PIPELINE: channelUnregistered, the channel[172.16.2.196:16378] 2024-11-15 17:47:18 WARN NettyServerCodecThread_5 - NETTY SERVER PIPELINE: exceptionCaught 172.16.1.231:56206 java.lang.OutOfMemoryError: Java heap space 2024-11-15 17:47:18 INFO NettyServerNIOSelector_3_2 - closeChannel: close the connection to remote address[172.16.1.231:56206] result: true 2024-11-15 17:47:18 WARN NettyServerCodecThread_5 - NETTY SERVER PIPELINE: exceptionCaught exception. java.lang.OutOfMemoryError: Java heap space at io.netty.buffer.PooledUnsafeDirectByteBuf$1.newUnsafeDirectByteBuffer(PooledUnsafeDirectByteBuf.java:34) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PooledUnsafeDirectByteBuf$1.newInstance(PooledUnsafeDirectByteBuf.java:31) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.Recycler.get(Recycler.java:147) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PooledUnsafeDirectByteBuf.newByteBuffer(PooledUnsafeDirectByteBuf.java:367) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PoolArena$DirectArena.allocate(PoolArena.java:727) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:140) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PooledUnsafeDirectByteBuf.(PooledUnsafeDirectByteBuf.java:49) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PooledUnsafeDirectByteBuf.newByteBuffer(PooledUnsafeDirectByteBuf.java:367) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PoolArena$DirectArena.allocate(PoolArena.java:727) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PoolArena.allocate(PoolArena.java:140) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PooledUnsafeDirectByteBuf.(PooledUnsafeDirectByteBuf.java:49) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.PooledUnsafeDirectByteBuf.newByteBuffer(PooledUnsafeDirectByteBuf.java:367) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:168) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.nio.AbstractNioByteChannel.newDirectBuffer(AbstractNioByteChannel.java:129) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.nio.AbstractNioByteChannel.filterOutboundMessage(AbstractNioByteChannel.java:104) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:799) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at org.apache.rocketmq.remoting.protocol.RemotingCommand.createResponseCommand(RemotingCommand.java:115) ~[rocketmq-remoting-4.3.2-zqyl.jar:4.3.2] at org.apache.rocketmq.remoting.protocol.RemotingCommand.createResponseCommand(RemotingCommand.java:100) ~[rocketmq-remoting-4.3.2-zqyl.jar:4.3.2] at org.apache.rocketmq.broker.processor.PullMessageProcessor.processRequest(PullMessageProcessor.java:657) ~[rocketmq-broker-4.3.2-zqyl.jar:na] at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract$1.run(NettyRemotingAbstract.java:182) ~[rocketmq-remoting-4.3.2-zqyl.jar:4.3.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]

The broker_default.log: 2024-11-15 17:47:18 WARN NettyServerNIOSelector_3_2 - Unexpected exception in the selector loop. java.lang.OutOfMemoryError: Java heap space at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:805) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:338) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:328) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:748) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1000) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:829) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:790) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.DefaultChannelPipeline.write(DefaultChannelPipeline.java:1013) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.handler.codec.MessageToMessageEncoder.write(MessageToMessageEncoder.java:110) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:702) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at org.apache.rocketmq.broker.processor.PullMessageProcessor.processRequest(PullMessageProcessor.java:839) ~[rocketmq-broker-4.3.2-zqyl.jar:na] at org.apache.rocketmq.broker.processor.PullMessageProcessor.access$2(PullMessageProcessor.java:539) ~[rocketmq-broker-4.3.2-zqyl.jar:na] at org.apache.rocketmq.broker.processor.PullMessageProcessor$2.run(PullMessageProcessor.java:884) ~[rocketmq-broker-4.3.2-zqyl.jar:na] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121] 2024-11-15 17:47:18 ERROR AdminBrokerThread_14 - Failed to submit a listener notification task. Event loop shut down? java.util.concurrent.RejectedExecutionException: event executor terminated at io.netty.util.concurrent.SingleThreadEventExecutor.reject(SingleThreadEventExecutor.java:805) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.offerTask(SingleThreadEventExecutor.java:338) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.addTask(SingleThreadEventExecutor.java:328) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.concurrent.SingleThreadEventExecutor.execute(SingleThreadEventExecutor.java:748) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:435) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.util.concurrent.DefaultPromise.setFailure(DefaultPromise.java:120) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.DefaultChannelPromise.setFailure(DefaultChannelPromise.java:87) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.safeExecute(AbstractChannelHandlerContext.java:1004) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:799) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1032) ~[netty-all-4.0.42.Final.jar:4.0.42.Final] at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract$1.run(NettyRemotingAbstract.java:182) ~[rocketmq-remoting-4.3.2-zqyl.jar:4.3.2] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_121] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_121] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]

Steps to Reproduce

  1. RocketMQ broker was running with 6GB of JVM heap on a machine with 16GB total memory.
  2. The broker was processing normal message traffic, including consumers and producers sending and receiving messages.
  3. The issue occurred suddenly at 17:47:18 without any prior warning and resulted in an OutOfMemoryError (Java heap space) during message processing and offset persistence.

What Did You Expect to See?

We expected the broker to handle message processing and memory allocation without running into memory exhaustion. The system should have utilized the allocated 6GB of JVM heap memory efficiently without causing an OutOfMemoryError.

What Did You See Instead?

The broker encountered an OutOfMemoryError (Java heap space), which led to a failure in message processing. This was followed by Netty errors related to memory allocation and task rejection, preventing the broker from handling new requests.

Additional Context

No response

yx9o commented 2 days ago

Can you provide more information, such as jvm startup parameters, etc.

noone6931 commented 1 day ago

broker start up param:

java -server -Xms6g -Xmx6g -Xmn2g \ -XX:+UseG1GC -XX:G1HeapRegionSize=16m -XX:G1ReservePercent=25 -XX:InitiatingHeapOccupancyPercent=30 \ -XX:SoftRefLRUPolicyMSPerMB=0 -XX:SurvivorRatio=8 -verbose:gc -Xloggc:/dev/shm/mqgc%p.log \ -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy \ -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=5 -XX:GCLogFileSize=30m -XX:OmitStackTraceInFastThrow -XX:MaxDirectMemorySize=15g -XX:+UseLargePages -XX:+UseBiasedLocking -Djava.ext.dirs=/home/source/jdk1.8.0_121/jre/lib/ext:/home/yjl/rocketmq-01/rocketmq-4.3.2/bin/../lib \ -cp .:/home/source/jdk1.8.0_121/lib/dt.jar:/home/source/jdk1.8.0_121/lib/tools.jar:/home/source/jdk1.8.0_121/jre/lib \ org.apache.rocketmq.broker.BrokerStartup

broker.conf: deleteWhen = 04 fileReservedTime = 48 brokerRole = ASYNC_MASTER flushDiskType = ASYNC_FLUSH sendMessageThreadPoolNums=32 useReentrantLockWhenPutMessage=true waitTimeMillsInSendQueue=1000 transferMsgByHeap=false warmMapedFileEnable=true