apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.29k stars 3.59k forks source link

[Bug] Broker stuck to create subscription and client fails to create consumer #20393

Open rdhabalia opened 1 year ago

rdhabalia commented 1 year ago

Search before asking

Version

pulsar-2.10, 2.11

Minimal reproduce step

  1. Sometime Broker tries to recover the topic cursor and couldn't create subscription successfully which causes consumer creation stuck at broker and client application times out and fails to subscribe on topic.

  2. Broker log

    05:50:28.411 [pulsar-io-4-32] INFO  org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38126] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub
    05:50:28.411 [pulsar-io-4-32] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [pulsar/use1/myns/persistent/topic-1] Cursor my-sub recovered to position 52505149:179
    05:50:28.412 [pulsar-io-4-32] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [pulsar/use1/myns/persistent/topic-1] Consumer my-sub cursor initialized with counters: consumed 0 mdPos 52505149:179 rdPos 52505149:180
    05:50:28.412 [pulsar-io-4-32] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/use1/myns/persistent/topic-1] Creating ledger, metadata: {component=[109, 97, 110, 97, 103, 101, 100, 45, 108, 101, 100, 103, 101, 114], pulsar/managed-ledger=[112, 117, 108, 115, 97, 114, 47, 112, 115, 116, 103, 45, 103, 113, 49, 47, 112, 115, 116, 103, 49, 45, 98, 114, 111, 107, 101, 114, 50, 46, 109, 101, 115, 115, 97, 103, 105, 110, 103, 46, 103, 113, 49, 46, 121, 97, 104, 111, 111, 46, 99, 111, 109, 58, 52, 48, 56, 48, 47, 112, 101, 114, 115, 105, 115, 116, 101, 110, 116, 47, 115, 97, 110, 105, 116, 121, 84, 101, 115, 116, 84, 111, 112, 105, 99, 45, 49], pulsar/cursor=[115, 97, 110, 105, 116, 121, 83, 117, 98], application=[112, 117, 108, 115, 97, 114]} - metadata ops timeout : 60 seconds
    05:50:28.421 [main-EventThread] INFO  org.apache.bookkeeper.mledger.impl.ManagedCursorImpl - [pulsar/use1/myns/persistent/topic-1] Created ledger 52506173 for cursor my-sub
    05:50:29.860 [bookkeeper-ml-scheduler-OrderedScheduler-5-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/use1/myns/persistent/topic-1] Ledger already created when timeout task is triggered
    05:51:28.412 [bookkeeper-ml-scheduler-OrderedScheduler-2-0] INFO  org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl - [pulsar/use1/myns/persistent/topic-1] Ledger already created when timeout task is triggered
    05:51:40.443 [pulsar-io-4-2] INFO  org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38174] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub
    05:53:47.732 [pulsar-io-4-5] INFO  org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38206] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub
    05:55:55.171 [pulsar-io-4-8] INFO  org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38238] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub
    05:58:02.655 [pulsar-io-4-11] INFO  org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38316] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub
    06:00:10.991 [pulsar-io-4-13] INFO  org.apache.pulsar.broker.service.ServerCnx - [/1.1.1.1:38348] Subscribing on topic persistent://pulsar/use1/myns/topic-1 / my-sub
  3. client logs while creating subscription

    2023-05-19 05:51:40 [pulsar-client-io-8-1] INFO  o.a.pulsar.client.impl.ConsumerImpl - [persistent://pulsar/use1/myns/topic-1][my-sub] Subscribing to topic on cnx
    [id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651], consumerId 0
    2023-05-19 05:52:10 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarDecoder - [broker2.use1.pulsar.com/1.1.1.1:6651] Received cmd PING
    2023-05-19 05:52:10 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarHandler - [[id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651]] Replying back to ping mess
    age
    2023-05-19 05:52:10 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarHandler - [[id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651]] Sending ping message
    2023-05-19 05:52:10 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarDecoder - [broker2.use1.pulsar.com/1.1.1.1:6651] Received cmd PONG
    2023-05-19 05:52:40 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarDecoder - [broker2.use1.pulsar.com/1.1.1.1:6651] Received cmd PING
    2023-05-19 05:52:40 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarHandler - [[id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651]] Replying back to ping mess
    age
    2023-05-19 05:52:40 [pulsar-client-io-8-1] DEBUG o.a.p.common.protocol.PulsarHandler - [[id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651]] Sending ping message
    2023-05-19 05:52:40 [pulsar-client-io-8-1] WARN  o.a.pulsar.client.impl.ConsumerImpl - [persistent://pulsar/use1/myns/topic-1][my-sub] Failed to subscribe to topi
    c on broker2.use1.pulsar.com/1.1.1.1:6651
    2023-05-19 05:52:40 [pulsar-client-io-8-1] WARN  o.a.pulsar.client.impl.ClientCnx - [id: 0xdcb4fbde, L:/1.1.1.1:38174 - R:broker2.use1.pulsar.com/1.1.1.1:6651] request timeout {'durationMs': 
    '30000', 'reqId':'3312998569082859124', 'remote':'broker2.use1.pulsar.com/1.1.1.1:6651', 'local':'/1.1.1.1:38174'}

What did you expect to see?

sometime broker stucks to create subscription and it can be only fixed by unloading the topic

What did you see instead?

broker should be resilient to handle failure and should make sure to recover curosr and create subscription successfully without impacting availability of the topic.

Anything else?

No response

Are you willing to submit a PR?

github-actions[bot] commented 1 year ago

The issue had no activity for 30 days, mark with Stale label.