Closed andrekramer1 closed 1 year ago
@andrekramer1 . Thanks for reporting this issue. This could be reproduce by these steps?
Is there any configurations changed? How many topics and partitions used in each producer? Is there any information regarding the message size and message produce/consume rate?
Increasing memory has not helped
Is there any change details for this memory increasing?
10000 consumers leads to the OOM happening quite quickly for standard configuration and one topic / no partitions. Messages were about 200 bytes and rate is "as fast as possible" (we did not use pulsar-perf) with each producer sending 1000 messages.
The OOM appear to be in reading from Bookkeeper and possibly the broker is issuing too many read requests at a time. We've had some limited success with the dispatcher throttling introduced in 2.5. Using both: ./bin/pulsar-admin namespaces set-dispatch-rate public/default \ --msg-dispatch-rate 500 \ --dispatch-rate-period 1 ./bin/pulsar-admin namespaces set-subscription-dispatch-rate public/default \ --msg-dispatch-rate 50 \ --dispatch-rate-period 1 allows runs to not OOM most but not all of the time.
A rate limit is difficult to use as the number of topics and subscriptions will not be known and we would have to plan for the worst case scenario.
One or more new limits on bookie parallel reads seem to be required?
Just in case anyone else is interested in high memory consumption issues, I filed an issue where there is an OOM with many consumers that are originating from Reader API usage, that is #8138 . It is a case since there the Readers are closed after usage, but the related consumer instances don't get GCd. When I was looking for related issues I also found #7680 . That might be directly related to this "OOM with many consumers" case.
Closed as stale. Please open a new issue if it's still relevant to the maintained versions.
Load testing on one standalone Pulsar with 4 producers and many consumers (1000s) we are getting out of memory errors - failing to allocate direct memory. Increasing memory has not helped with enough producers, Pulsar is failing with exceptions like the following when there are many consumers:
Looking into a memory dump showed many buffers allocated to nio/netty.
This issue could be related to #5751, #5720.
We were load testing on a single Linux host with plenty of physical memory.
Possibly related to issues #5513, #4196, #4632, we've also seen one crash due to Direct memory error but that seemed related to Bookkeeper processing: