apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.24k stars 3.59k forks source link

[feat] Add topic stats and metrics for observing message replay behavior and Key_Shared filtering/blocking behavior #23205

Closed lhotari closed 4 weeks ago

lhotari commented 2 months ago

Search before asking

Motivation

Currently, it's very challenging to investigate issues related to message replay ("message redelivery controller"). Some examples of this include:

Solution

Add topic stats and metrics for observing message replay and related Key_Shared filtering (hash blocking) behavior.

Specific Metrics to Consider

  1. Number of messages in redelivery (replay)
  2. For Key_Shared subscriptions: Ways to observe internal state related to blocked hashes
  3. Counter for delayed delivery messages being added to delivery (replay)

Implementation Requirements

Expected Benefits

Alternatives

No response

Anything else?

No response

Are you willing to submit a PR?

lhotari commented 2 months ago

It seems that PIP-282 added some subscription stats in https://github.com/apache/pulsar/pull/21953 that improve observability of Key_Shared.

lhotari commented 2 months ago

There's already a counter for message redelivery: https://github.com/apache/pulsar/blob/77b6378ae8b9ac83962f71063ad44d6ac57f8e32/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L959-L961 However, this isn't currently exposed in the subscription stats. This counter was added as part of Otel changes in https://github.com/apache/pulsar/pull/22693 . There's also an ack counter that was added: https://github.com/apache/pulsar/blob/77b6378ae8b9ac83962f71063ad44d6ac57f8e32/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/Consumer.java#L955-L957

I think that it would be a non-breaking change to expose these in stats which wouldn't necessarily require a PIP.

lhotari commented 1 month ago

PIP-379: Key_Shared Draining Hashes for Improved Message Ordering covers observability.

lhotari commented 4 weeks ago

23224 implemented msgInReplay / pulsar_subscription_in_replay.

lhotari commented 4 weeks ago

23429 adds observability for PIP-379 Key_Shared implementation.

drainingHashesCount, drainingHashesClearedTotal, drainingHashesUnackedMessages and drainingHashes

lhotari commented 4 weeks ago

Closing this as resolved with #23224 and #23429 in PIP-379 implementation.