Open KannarFr opened 1 year ago
Thanks for reporting this. This is an issue which has stayed unresolved for many years. Some previous reports: #5284 and #14941.
There's an ugly workaround for the problem:
By setting topicFencingTimeoutSeconds=5
for brokers, it will release the "fencing" after 5 seconds.
However, there is a chance that this causes other problems such as data consistency problems. If metadata gets overwritten, it could lead to data loss.
The recently merged fixes #18688 and #20527 could help improve the situation. I happened to investigate problems in this area yesterday.
I have created #20540 to address some issues that I have observed in the current solution. One of the remaining challenges in the PR is adding proper test coverage. I'm also waiting for feedback from other code contributors on the PR before finishing it. I'd appreciate feedback on the PR #20540.
@KannarFr
Jun 07 11:20:48 yo-pulsar-broker-c3-n4 pulsar[336]: 2023-06-07T11:20:48,112+0000 [BookKeeperClientWorker-OrderedExecutor-10-0] WARN org.apache.pulsar.broker.service.AbstractTopic - [persistent://tenant/ns/topic-partition-0] Attempting to add producer to a fenced topic
At this time, do you know if there is a bundle unload or a namespace unload executed?
You can check the HTTP request log to confirm it.
Unfortunately, I do not have suck logs retention. I should take a dump, my bad.
The issue had no activity for 30 days, mark with Stale label.
Still impacted with this issue. Restarting brokers all day long doesn't seem a proper situation. I wonder if there is any production deployment that's not concerned by this issue, if so, how?
Still impacted with this issue. Restarting brokers all day long doesn't seem a proper situation. I wonder if there is any production deployment that's not concerned by this issue, if so, how?
@StevenLeRoux which Pulsar version are you using? do you have a chance to test #20540 with a custom build?
@lhotari Thanks for pointing out to #20540
We're using currently v3.1.1, but we will get the chance to test with #20540 in a few days (cc @KannarFr )
@lhotari Thanks for pointing out to #20540
We're using currently v3.1.1, but we will get the chance to test with #20540 in a few days (cc @KannarFr )
@StevenLeRoux @KannarFr FYI, there's a new bug report #21860 in this area with a promising bug fix in the Bookkeeper client in the works.
Thanks for pinging us @lhotari.
Search before asking
Version
2.11.1
Minimal reproduce step
I have a cluster with thousands of topics and one became fenced see broker's logs:
The topic was fenced for 30mins. I just restarted the broker and everything looks good now. Any idea? A wrong cache of fenced or something?
What did you expect to see?
Not fenced
What did you see instead?
Fenced
Anything else?
No response
Are you willing to submit a PR?