Closed Shawyeok closed 9 months ago
Hello, I am also learning pulsar recently, have you found a solution to this problem?
Sorry for late reply.
The current publish rate limiting is based on setting the autoRead
flag of the Netty channel to false. However, autoRead
is just a hint in Netty, not a guarantee (if anyone is familiar with this, a detailed explanation would be great). As a result, topics that have already been rate-limited may call the isTopicPublishRateExceeded
method multiple times before the next rate-limiting period arrives, which is unnecessary. So, an optimization I'm thinking of is to set a flag after a topic triggers rate limiting, to avoid having to check whether the topic has already been rate-limited through the isTopicPublishRateExceeded
method each time.
Of course, we can also optimize the implementation of org.apache.pulsar.common.util.RateLimiter
. Perhaps an implementation based on Compare-And-Swap (CAS) would perform better than synchronized
. All of these ideas need to be validated through experimentation.
There is some related discuss in mail list:
Sorry for late reply.
The current publish rate limiting is based on setting the
autoRead
flag of the Netty channel to false. However,autoRead
is just a hint in Netty, not a guarantee (if anyone is familiar with this, a detailed explanation would be great). As a result, topics that have already been rate-limited may call theisTopicPublishRateExceeded
method multiple times before the next rate-limiting period arrives, which is unnecessary. So, an optimization I'm thinking of is to set a flag after a topic triggers rate limiting, to avoid having to check whether the topic has already been rate-limited through theisTopicPublishRateExceeded
method each time.Of course, we can also optimize the implementation of
org.apache.pulsar.common.util.RateLimiter
. Perhaps an implementation based on Compare-And-Swap (CAS) would perform better thansynchronized
. All of these ideas need to be validated through experimentation.
Sorry, I am a beginner of pulsar, there is a problem I don't understand, can you help me solve it? The questioner has said affected other theme writing efficiency, the current theme open preciseTopicPublishRateLimiterEnable, why can affect other theme. I can't find the answer according to your answer, can you help me?
Sorry for late reply. The current publish rate limiting is based on setting the
autoRead
flag of the Netty channel to false. However,autoRead
is just a hint in Netty, not a guarantee (if anyone is familiar with this, a detailed explanation would be great). As a result, topics that have already been rate-limited may call theisTopicPublishRateExceeded
method multiple times before the next rate-limiting period arrives, which is unnecessary. So, an optimization I'm thinking of is to set a flag after a topic triggers rate limiting, to avoid having to check whether the topic has already been rate-limited through theisTopicPublishRateExceeded
method each time. Of course, we can also optimize the implementation oforg.apache.pulsar.common.util.RateLimiter
. Perhaps an implementation based on Compare-And-Swap (CAS) would perform better thansynchronized
. All of these ideas need to be validated through experimentation.Sorry, I am a beginner of pulsar, there is a problem I don't understand, can you help me solve it? The questioner has said affected other theme writing efficiency, the current theme open preciseTopicPublishRateLimiterEnable, why can affect other theme. I can't find the answer according to your answer, can you help me?
In Pulsar internal, there are multiple topics that depend on the same thread for message writing (such as Netty's EventLoop and OrderedExecutor). If, for some reason, these threads encounter a performance bottleneck, such as the lock contention observed in this case, then even those topics not directly causing the lock contention will be affected, as long as they are also bound to these same threads.
Of course, we can also optimize the implementation of
org.apache.pulsar.common.util.RateLimiter
. Perhaps an implementation based on Compare-And-Swap (CAS) would perform better thansynchronized
. All of these ideas need to be validated through experimentation.
@Shawyeok I'm working on a non-blocking implementation that uses CAS. Please check https://github.com/lhotari/async-tokenbucket for the PoC and performance test.
In addition, I'll be fixing the issue in handling the way how autoread is toggled. (WIP at https://github.com/lhotari/pulsar/blob/lh-rate-limiter-improvement/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ThrottleTracker.java )
Hello, I am also learning pulsar recently, have you found a solution to this problem?
@1625567290 Just curious to learn about your motivation to start learning Pulsar here. This might not be the easiest way to learn Pulsar. :)
Of course, we can also optimize the implementation of
org.apache.pulsar.common.util.RateLimiter
. Perhaps an implementation based on Compare-And-Swap (CAS) would perform better thansynchronized
. All of these ideas need to be validated through experimentation.@Shawyeok I'm working on a non-blocking implementation that uses CAS. Please check https://github.com/lhotari/async-tokenbucket for the PoC and performance test.
In addition, I'll be fixing the issue in handling the way how autoread is toggled. (WIP at https://github.com/lhotari/pulsar/blob/lh-rate-limiter-improvement/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ThrottleTracker.java )
@lhotari Cool! I will study and verify your solution as soon as possible after my vacation, which may be a few days.
@lhotari Cool! I will study and verify your solution as soon as possible after my vacation, which may be a few days.
Sounds good. Please check the blog post https://codingthestreams.com/pulsar/2023/11/22/pulsar-slos-and-rate-limiting.html for the full context.
@Shawyeok There's now "PIP-322: Pulsar Rate Limiting Refactoring" (#21680) to address this issue. The changes are in the draft PR #21681 . Please review! Thanks
Search before asking
After enabled the
preciseTopicPublishRateLimiterEnable
configuration inbroker.conf
, if one topic on the broker exceeds the publish rate limit (for example, 1MB/s), it will cause lock contention, leading to an increase in the 99th percentile write latency for other topics that have not reached the rate limit.The profile shows that there is lock content inside the method
org.apache.pulsar.broker.service.PrecisPublishLimiter#tryAcquire
.pulsar-3.0.1.lock.profile.008.html.zip
Minimal reproduce step
Noticed in a fork of
2.8.1
, reproduced in3.0.1
on Linux CentOS 7.93.10.0-1160.99.1.el7.x86_64
What did you expect to see?
One topic exceeds publish rate limit should not effect other topics write latency.
What did you see instead?
Another topic write latency also increased.
Anything else?
No response
Are you willing to submit a PR?