apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.3k stars 3.6k forks source link

maxBatchSize preallocated memory may be thousands of times larger than actual message length #14943

Open keyboardbobo opened 2 years ago

keyboardbobo commented 2 years ago

Is your enhancement request related to a problem? Please describe. Guys, during the stress test, we found that the client's memory is very large, and even Full gc appeared. After analyzing the dump file, we found that the memory space occupied is much larger than the actual message size, and the serialization of 1KB of messages takes up 1MB.

batchedMessageMetadataAndPayload = PulsarByteBufAllocator.DEFAULT
                        .buffer(Math.min(maxBatchSize, ClientCnx.getMaxMessageSize()));

maxBatchSize = Math.max(maxBatchSize, uncompressedSize);

Debugging found that the maxBatchSize that controls the pre-allocated ByteBufPair.b2 memory size is stateful. As the size of the largest batch or the largest single message grows, this may cause the pre-allocated ByteBufPair.b2 memory to grow larger and larger , which maybe thousands of times larger than the payload of MessageImpl.

Lowering the value of batchingMaxMessagesmay reduce the risk of problems, but a single message that may be too large can also cause problems

Describe the solution you'd like It is best to loop through the messages to be packed to accurately calculate the memory size to be allocated

Describe alternatives you've considered Let the user choose whether to precisely allocate or pre-allocate

Additional context

maxPendingMessages=2000
maxPendingMessagesAcrossPartitions=40000
blockIfQueueFull=false
sendTimeoutMs=5000
batchingMaxPublishDelayMicros=50
batchingMaxMessages=2000
batchingMaxBytes=5242880
batchingEnabled=true

A11E7121-0FC6-4A0F-AC88-FC33AF63F41C

gaozhangmin commented 2 years ago

I have a doubt, How to achieve precisely allocate? @keyboardbobo

keyboardbobo commented 2 years ago

It is best to loop through the messages to be packed to accurately calculate the memory size to be allocated

Whether it is possible to loop through the messages to be packaged and add their sizes to accurately calculate the size of the memory to be allocated?

tjiuming commented 2 years ago

How about use CompositeByteBuf, allocate small memory and let it grow. @codelipenghui Could you please assign it to me?

codelipenghui commented 2 years ago

@tjiuming Yes, I think CompositeByteBuf will work which provided a Bytebuf that can be dynamically expanded

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.

github-actions[bot] commented 2 years ago

The issue had no activity for 30 days, mark with Stale label.