Open Shawyeok opened 2 years ago
+1, I only added disks to one bookie, and the next day the error is still flashing, even if it is lost, shouldn't it be recovered through autorecovery?
+1, I only added disks to one bookie, and the next day the error is still flashing, even if it is lost, shouldn't it be recovered through autorecovery?
Each time you adjust one, the more "entry" you lose
@zcola Did you try restart the broker service? In my experience, restart broker service will take effect for a while. I want figure out why it happens.
I am suffering from a similar problem too, after a bookie instance crashed, and a new instance was added into the bookkeeper cluster, my bookie log keeps saying that: No ledger found while performing readLac from ledger: xxx
The issue had no activity for 30 days, mark with Stale label.
The issue had no activity for 30 days, mark with Stale label.
After some investigation, here are the conditions of this issue:
LedgerHandle
open (some subscriptions are stuck in a redelivery loop).LedgerHandler
internal metadata.bookieId
, it will receive read entry requests from ledgers that previously belonged to it.
Describe the bug In our staging cluster, some bookie instances keep logging
No ledger found while reading entry: xx from ledger: xxxx
, I did some investigation and found the internal metadata ofLedgerHandle
(org.apache.bookkeeper.client.LedgerHandle#versionedMetadata) of specific ledgers are inconsistent with latest ledger metadata in zookeeper.Screenshots Bookie logs: On that bookie, it says the
readEntry
request is from172.30.10.3:48336
which is a broker instance.On broker instance
172.30.10.3:8080
, it saysensembles
are172.30.10.5:3181, 172.30.10.2:3181
:But the latest ledger metadata shows
ensembles
are172.30.10.4:3181, 172.30.10.3:3181
:So the broker keep sent readEntry requests to wrong bookies which are not in ledger ensemble list.
Desktop (please complete the following information):
Additional context Bookie auto-recovery is enabled.
Machine
172.30.10.2
has crashed once at2022-01-03 17:40:08 +08:00
, and bookie instance started at2022-01-03 19:35:08 +08:00
,No ledger found
log start shows after172.30.10.2
crashed.Possible related https://github.com/apache/pulsar/issues/7214