apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.23k stars 3.58k forks source link

[Bug] Error while reading ledger - ledger=13 - operation=Failed to read entry #23493

Open kulame opened 1 week ago

kulame commented 1 week ago

Search before asking

Read release policy

Version

pulsar 3.3.1

pulsar standalone

Minimal reproduce step

when i kill pulsar server, it can't restart.

What did you expect to see?

restart pulsar server successfully

What did you see instead?

errorllog

2024-10-21T13:22:37.107148+00:00 ip-172-31-86-237 pulsar[2304114]: 2024-10-21T13:22:37,106+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Read for failed on bookie 172.31.86.237:38149 code EIO
2024-10-21T13:22:37.107845+00:00 ip-172-31-86-237 pulsar[2304114]: 2024-10-21T13:22:37,107+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] INFO  org.apache.bookkeeper.client.ReadOpBase - Error: Error while reading ledger while reading L13 E0 from bookie: 172.31.86.237:38149
2024-10-21T13:22:37.108054+00:00 ip-172-31-86-237 pulsar[2304114]: 2024-10-21T13:22:37,107+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.PendingReadOp - Read of ledger entry failed: L13 E0-E0, Sent to [172.31.86.237:38149], Heard from [] : bitset = {}, Error = 'Error while reading ledger'. First unread entry is (-1, rc = null)
2024-10-21T13:22:37.108286+00:00 ip-172-31-86-237 pulsar[2304114]: 2024-10-21T13:22:37,107+0000 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN  org.apache.pulsar.broker.service.ServerCnx - [/127.0.0.1:43094][persistent://public/default/__change_events][__system_reader-reader-3eb78daeb6] Failed to create consumer: consumerId=4, Error while reading ledger -  ledger=13 - operation=Failed to read entry - entry=0
2024-10-21T13:22:37.108771+00:00 ip-172-31-86-237 pulsar[2304114]: 2024-10-21T13:22:37,108+0000 [pulsar-io-18-4] WARN  org.apache.pulsar.client.impl.ClientCnx - [id: 0x4c2c9282, L:/127.0.0.1:43094 - R:localhost/127.0.0.1:6650] Received error from server: Error while reading ledger -  ledger=13 - operation=Failed to read entry - entry=0
2024-10-21T13:22:37.108954+00:00 ip-172-31-86-237 pulsar[2304114]: 2024-10-21T13:22:37,108+0000 [pulsar-io-18-4] WARN  org.apache.pulsar.client.impl.ConsumerImpl - [persistent://public/default/__change_events][__system_reader-reader-3eb78daeb6] Failed to subscribe to topic on localhost/127.0.0.1:6650
2024-10-21T13:22:37.109045+00:00 ip-172-31-86-237 pulsar[2304114]: 2024-10-21T13:22:37,108+0000 [pulsar-io-18-4] WARN  org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/__change_events] [__system_reader-reader-3eb78daeb6] Error connecting to broker: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"Error while reading ledger -  ledger=13 - operation=Failed to read entry - entry=0","reqId":2435734987288602753, "remote":"localhost/127.0.0.1:6650", "local":"/127.0.0.1:43094"}
2024-10-21T13:22:37.109144+00:00 ip-172-31-86-237 pulsar[2304114]: 2024-10-21T13:22:37,109+0000 [pulsar-io-18-4] WARN  org.apache.pulsar.client.impl.ConnectionHandler - [persistent://public/default/__change_events] [__system_reader-reader-3eb78daeb6] Could not get connection to broker: org.apache.pulsar.client.api.PulsarClientException: {"errorMsg":"Error while reading ledger -  ledger=13 - operation=Failed to read entry - entry=0","reqId":2435734987288602753, "remote":"localhost/127.0.0.1:6650", "local":"/127.0.0.1:43094"} -- Will try again in 5.921 s

Anything else?

No response

Are you willing to submit a PR?

lhotari commented 1 week ago

Minimal reproduce step

when i kill pulsar server, it can't restart.

@kulame Can you consistently reproduce this issue with a sequence of steps that someone else could perform to reproduce the issue?

kulame commented 1 week ago

@lhotari i will try it.

mawenyu commented 3 days ago

I use linux x64, java17, pulsar 3.0.4, pulsar c++ client 3.4.2, also face the same problem ; However, the problem is triggered when the storage node is powered off and restarted.