[X] I searched in the issues and found nothing similar.
Version
master branch.
Minimal reproduce step
# create topic with 10 partition
bin/pulsar-admin topics create-partitioned-topic test/tb5/testTxn10 --partitions 10
# start perf process
bin/pulsar-perf produce -r 2048000 -bm 10 -txn -nmt 1000 persistent://test/tb5/testTxn10
bin/pulsar-perf consume -r 2048000 -txn -nmt 1500 persistent://test/tb5/testTxn10
# Restart broker, which will trigger topic unload and reload, TP, TB, TC recovery.
bin/pulsar-daemon restart broker
What did you expect to see?
we expect to see that transactional producer and consumer work like before.
What did you see instead?
some partitions of topic persistent://test/tb5/testTxn10 do not work.
There are only two partitions work normally.
The client reports the following error. The partition without traffic will have the corresponding error message.
2023-01-06T14:31:03,907+0800 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0x52f04075, L:/172.24.25.42:49158 - R:cluster2-nn0.bigo.baina/172.24.25.41:6650] Received error from server: **Exclusive consumer is already connected**
2023-01-06T14:31:03,907+0800 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConsumerImpl - [persistent://test/tb5/testTxn10-**partition-2**][sub] **Failed to subscribe to topic** on cluster2-nn0.bigo.baina/172.24.25.41:6650
2023-01-06T14:31:33,177+0800 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ClientCnx - [id: 0x951d9be7, L:/172.24.25.42:49171 - R:cluster2-nn0.bigo.baina/172.24.25.41:6650] Received error from server: Exclusive consumer is already connected
2023-01-06T14:31:33,177+0800 [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ConsumerImpl - [persistent://test/tb5/testTxn10-partition-6][sub] Failed to subscribe to topic on cluster2-nn0.bigo.baina/172.24.25.41:6650
And i find that all TP corresponding to 10 partitions of topics have been recovered.
python3 calculateTbpRecoverTime.py TP
TP for topic:persistent://test/tb5/testTxn10-partition-0 sub:sub recover time in milliseconds: 431442
TP for topic:persistent://test/tb5/testTxn10-partition-1 sub:sub recover time in milliseconds: 275965
TP for topic:persistent://test/tb5/testTxn10-partition-2 sub:sub recover time in milliseconds: 302184
TP for topic:persistent://test/tb5/testTxn10-partition-3 sub:sub recover time in milliseconds: 282678
TP for topic:persistent://test/tb5/testTxn10-partition-4 sub:sub recover time in milliseconds: 336789
TP for topic:persistent://test/tb5/testTxn10-partition-5 sub:sub recover time in milliseconds: 341019
TP for topic:persistent://test/tb5/testTxn10-partition-6 sub:sub recover time in milliseconds: 279376
TP for topic:persistent://test/tb5/testTxn10-partition-7 sub:sub recover time in milliseconds: 431452
TP for topic:persistent://test/tb5/testTxn10-partition-8 sub:sub recover time in milliseconds: 282711
TP for topic:persistent://test/tb5/testTxn10-partition-9 sub:sub recover time in milliseconds: 250019
Query the information of one topic without traffic.
It is found that the subscription sub of this partition has been connected with a consumer (called 46bbb), but it has no traffic. The pressure testing tool is trying to reconnect and create a new consumer with subscription sub. However, since the sub is exclusive, and there is an inactive consumer 46bbb, which causes that the new consumer cannot be created successfully.
It can be seen from the connection time that this non working consumer is created after broker restarting, but the problem is why it does not work?
As time goes by, partitions that cannot work before begin to have consuming traffic.
Query partition information again.
Search before asking
Version
master branch.
Minimal reproduce step
What did you expect to see?
we expect to see that transactional producer and consumer work like before.
What did you see instead?
some partitions of topic
persistent://test/tb5/testTxn10
do not work. There are only two partitions work normally.The client reports the following error. The partition without traffic will have the corresponding error message.
And i find that all TP corresponding to 10 partitions of topics have been recovered.
Query the information of one topic without traffic.
It is found that the subscription
sub
of this partition has been connected with a consumer (called46bbb
), but it has no traffic. The pressure testing tool is trying to reconnect and create a new consumer with subscriptionsub
. However, since thesub
is exclusive, and there is an inactive consumer46bbb
, which causes that the new consumer cannot be created successfully. It can be seen from the connection time that this non working consumer is created after broker restarting, but the problem is why it does not work?As time goes by, partitions that cannot work before begin to have consuming traffic. Query partition information again.
It is found that the current working consumer is the previous non working consumer!
Anything else?
No response
Are you willing to submit a PR?