Open yunfeng79 opened 2 years ago
@yunfeng79 Not enough non-faulty bk means your alive bk node number is less than broker‘s managedLedgerDefaultEnsembleSize
config request https://github.com/apache/pulsar/blob/master/conf/broker.conf#L902. You can take a check on it.
I got the same error, my broker config is
managedLedgerDefaultEnsembleSize=2
what should I do to fix this error?
Pls run the command bin/bookkeeper shell listbookies -rw -h
to get the list the ReadWrite Bookies. And check that all broker configs keep consistent managedLedgerDefaultEnsembleSize
, managedLedgerDefaultWriteQuorum
, managedLedgerDefaultAckQuorum
.
Then check that ReadWrite Bookies number >= managedLedgerDefaultEnsembleSize
>= managedLedgerDefaultWriteQuorum
>= managedLedgerDefaultAckQuorum
.
@ericsyh check that ReadWrite Bookies number is same,Still have the same error?
java.util.concurrent.CompletionException: org.apache.pulsar.client.api.PulsarClientException$BrokerPersistenceException: org.apache.bookkeeper.mledger.ManagedLedgerException: Not enough non-faulty bookies available at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_301] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_301] at java.util.concurrent.CompletableFuture.uniRun(CompletableFuture.java:714) ~[?:1.8.0_301] at java.util.concurrent.CompletableFuture$UniRun.tryFire(CompletableFuture.java:701) ~[?:1.8.0_301] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) [?:1.8.0_301] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) [?:1.8.0_301] at org.apache.pulsar.client.impl.ClientCnx.handleError(ClientCnx.java:691) [org.apache.pulsar-pulsar-client-original-2.8.0.jar:2.8.0] at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:176) [org.apache.pulsar-pulsar-common-2.8.0.jar:2.8.0] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:324) [io.netty-netty-codec-4.1.63.Final.jar:4.1.63.Final] at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:296) [io.netty-netty-codec-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [io.netty-netty-transport-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:795) [io.netty-netty-transport-native-epoll-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:480) [io.netty-netty-transport-native-epoll-4.1.63.Final.jar:4.1.63.Final] at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [io.netty-netty-transport-native-epoll-4.1.63.Final.jar:4.1.63.Final] at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final]
The issue had no activity for 30 days, mark with Stale label.
I encountered the same issue when I run bin/bookkeeper shell bookiesanity
. I'm using the pulsar-2.10.0.
2022-05-02T12:44:16,501+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.SanityTestCommand - Create ledger 5
2022-05-02T12:44:16,656+0800 [bookkeeper-io-3-1] ERROR org.apache.bookkeeper.proto.PerChannelBookieClient - Could not connect to bookie: [id: 0x5a822f2d, L:/host:59178]/host:3181, current state CONNECTING :
io.netty.channel.AbstractChannel$AnnotatedConnectException: finishConnect(..) failed: Connection refused: /host:3181
Caused by: java.net.ConnectException: finishConnect(..) failed: Connection refused
at io.netty.channel.unix.Errors.newConnectException0(Errors.java:155) ~[io.netty-netty-transport-native-unix-common-4.1.74.Final.jar:4.1.74.Final]
at io.netty.channel.unix.Errors.handleConnectErrno(Errors.java:128) ~[io.netty-netty-transport-native-unix-common-4.1.74.Final.jar:4.1.74.Final]
at io.netty.channel.unix.Socket.finishConnect(Socket.java:320) ~[io.netty-netty-transport-native-unix-common-4.1.74.Final.jar:4.1.74.Final]
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.doFinishConnect(AbstractEpollChannel.java:710) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.finishConnect(AbstractEpollChannel.java:687) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
at io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe.epollOutReady(AbstractEpollChannel.java:567) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:470) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:378) [io.netty-netty-transport-classes-epoll-4.1.74.Final.jar:4.1.74.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2022-05-02T12:44:16,677+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (5, 0): Bookie handle is not available
2022-05-02T12:44:16,691+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=5,loopId=567e98f4) Exception updating
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
at org.apache.bookkeeper.bookie.LocalBookieEnsemblePlacementPolicy.replaceBookie(LocalBookieEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:334) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1935) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:134) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:123) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1954) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1903) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:378) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.proto.BookieClientImpl$1.safeRun(BookieClientImpl.java:295) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.4.jar:4.14.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2022-05-02T12:44:16,693+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] WARN org.apache.bookkeeper.client.LedgerHandle - [EnsembleChange(ledger:5, change-id:0000000001)][attempt:1] Exception changing ensemble
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
at org.apache.bookkeeper.bookie.LocalBookieEnsemblePlacementPolicy.replaceBookie(LocalBookieEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:334) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1935) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:134) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:123) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1954) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1903) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:378) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.proto.BookieClientImpl$1.safeRun(BookieClientImpl.java:295) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.14.4.jar:4.14.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
2022-05-02T12:44:16,693+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] ERROR org.apache.bookkeeper.client.LedgerHandle - Closing ledger 5 due to NotEnoughBookiesException: Not enough non-faulty bookies available
2022-05-02T12:44:16,696+0800 [BookKeeperClientWorker-OrderedExecutor-5-0] ERROR org.apache.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: L5 E0
2022-05-02T12:44:16,696+0800 [main] WARN org.apache.bookkeeper.tools.cli.commands.bookie.SanityTestCommand - Error in bookie sanity test
org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available
at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.SyncCallbackUtils$SyncAddCallback.addComplete(SyncCallbackUtils.java:251) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.AsyncCallback$AddCallback.addCompleteWithLatency(AsyncCallback.java:92) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:431) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:1799) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.client.LedgerHandle$5.safeRun(LedgerHandle.java:574) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.14.4.jar:4.14.4]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_262]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_262]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.74.Final.jar:4.1.74.Final]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_262]
2022-05-02T12:44:16,704+0800 [main-EventThread] WARN org.apache.bookkeeper.meta.AbstractZkLedgerManager - Ledger node does not exist in ZooKeeper: ledgerId=5
2022-05-02T12:44:16,704+0800 [main-EventThread] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=5,loopId=48132b31) Error writing metadata to store
org.apache.bookkeeper.client.BKException$BKNoSuchLedgerExistsOnMetadataServerException: No such ledger exists on Metadata Server
at org.apache.bookkeeper.meta.AbstractZkLedgerManager$4.processResult(AbstractZkLedgerManager.java:505) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.zookeeper.ZooKeeperClient$22$1.processResult(ZooKeeperClient.java:1094) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:638) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:563) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
2022-05-02T12:44:16,704+0800 [main-EventThread] WARN org.apache.bookkeeper.client.LedgerHandle - Close failed: NoSuchLedgerExistsOnMetadataServerException: No such ledger exists on Metadata Server
2022-05-02T12:44:16,709+0800 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookie.SanityTestCommand - Deleted ledger 5
2022-05-02T12:44:16,710+0800 [main] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Closing the per channel bookie client for host:3181
2022-05-02T12:44:16,711+0800 [main] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Closing the per channel bookie client for host:3181
2022-05-02T12:44:16,836+0800 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x30c3eef1b130009 closed
2022-05-02T12:44:16,836+0800 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x30c3eef1b130009
The issue had no activity for 30 days, mark with Stale label.
any update ? still stuck with this~
Same problem here. We are unable to restart our cluster. Brokers keep on crashing all the time due to this issue.
Any workaround?
Pls run the command
bin/bookkeeper shell listbookies -rw -h
to get the list the ReadWrite Bookies. And check that all broker configs keep consistentmanagedLedgerDefaultEnsembleSize
,managedLedgerDefaultWriteQuorum
,managedLedgerDefaultAckQuorum
.Then check that ReadWrite Bookies number >=
managedLedgerDefaultEnsembleSize
>=managedLedgerDefaultWriteQuorum
>=managedLedgerDefaultAckQuorum
.
When I issue that command from our toolset pod I get the following:
08:54:48.798 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - ReadWrite Bookies :
08:54:48.812 [main] INFO org.apache.bookkeeper.tools.cli.commands.bookies.ListBookiesCommand - BookieID:rev-pulsar-bookie-2.rev-pulsar-bookie.mdp-dgt-review-check-data-ocdup2.svc.cluster.local:3181, IP:10.72.168.230, Port:3181, Hostname:rev-pulsar-bookie-2.rev-pulsar-bookie.mdp-dgt-review-check-data-ocdup2.svc.cluster.local
08:54:48.816 [main-SendThread(rev-pulsar-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x10053b99a7100f5.
org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x10053b99a7100f5, likely server has closed socket
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]
08:54:48.933 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x10053b99a7100f5 closed
08:54:48.933 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x10053b99a7100f5
@ericsyh what do you suggest?
@fmiguelez Your bookie replica should be more than 3 but the returned readwrite bookie is only rev-pulsar-bookie-2
.
Is the bookie pod rev-pulsar-bookie-0
and rev-pulsar-bookie-1
still running? You can check their disk free status by https://bookkeeper.apache.org/docs/admin/http#endpoint-apiv1bookieinfo
In the toolset pod, you can try curl rev-pulsar-bookie-0.rev-pulsar-bookie.mdp-dgt-review-check-data-ocdup2.svc.cluster.local:8000/api/v1/bookie/info
@fmiguelez Your bookie replica should be more than 3 but the returned readwrite bookie is only
rev-pulsar-bookie-2
.Is the bookie pod
rev-pulsar-bookie-0
andrev-pulsar-bookie-1
still running? You can check their disk free status by https://bookkeeper.apache.org/docs/admin/http#endpoint-apiv1bookieinfoIn the toolset pod, you can try
curl rev-pulsar-bookie-0.rev-pulsar-bookie.mdp-dgt-review-check-data-ocdup2.svc.cluster.local:8000/api/v1/bookie/info
It was actually a matter of disk space. Bookie logs reported disk usage beyond threshold, so they were set in read-only mode.
fer@N315D154:~/ws/mdp/mdp-demo$ kubectl -n mdp-dgt-review-check-data-ocdup2 logs --tail 2 rev-pulsar-bookie-0
09:05:58.784 [LedgerDirsMonitorThread] ERROR org.apache.bookkeeper.util.DiskChecker - Space left on device /pulsar/data/bookkeeper/ledgers/current : 2447400960, Used space fraction: 0.953402 > threshold 0.95.
09:05:58.826 [SyncThread-7-1] INFO org.apache.bookkeeper.bookie.LedgerDirsManager - No writable ledger dirs below diskUsageThreshold. But Dirs that can accommodate 1073741824 are: [/pulsar/data/bookkeeper/ledgers/current]
fer@N315D154:~/ws/mdp/mdp-demo$ kubectl -n mdp-dgt-review-check-data-ocdup2 logs --tail 2 rev-pulsar-bookie-1
09:05:57.685 [LedgerDirsMonitorThread] ERROR org.apache.bookkeeper.util.DiskChecker - Space left on device /pulsar/data/bookkeeper/ledgers/current : 2414260224, Used space fraction: 0.95403296 > threshold 0.95.
09:05:57.734 [SyncThread-7-1] INFO org.apache.bookkeeper.bookie.LedgerDirsManager - No writable ledger dirs below diskUsageThreshold. But Dirs that can accommodate 1073741824 are: [/pulsar/data/bookkeeper/ledgers/current]
fer@N315D154:~/ws/mdp/mdp-demo$ kubectl -n mdp-dgt-review-check-data-ocdup2 logs --tail 2 rev-pulsar-bookie-2
08:44:00.631 [bookie-io-1-1] INFO org.apache.bookkeeper.proto.BookieRequestHandler - Channels disconnected: [id: 0x2b678089, L:/10.72.168.230:3181 ! R:/10.72.168.138:39958]
08:56:59.187 [GarbageCollectorThread-11-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Disk almost full, suspend major compaction to slow down filling disk.
fer@N315D154:~/ws/mdp/mdp-demo$ kubectl -n mdp-dgt-review-check-data-ocdup2 logs --tail 2 rev-pulsar-bookie-3
09:07:18.867 [LedgerDirsMonitorThread] WARN org.apache.bookkeeper.bookie.LedgerDirsMonitor - LedgerDirsMonitor check process: All ledger directories are non writable
09:07:18.868 [LedgerDirsMonitorThread] ERROR org.apache.bookkeeper.util.DiskChecker - Space left on device /pulsar/data/bookkeeper/ledgers/current : 2292150272, Used space fraction: 0.9563579 > threshold 0.95.
I modified the bookie configmap to set a threshold of 0.98.
fer@N315D154:~$ kubectl -n mdp-dgt-review-check-data-ocdup2 get configmaps rev-pulsar-bookie -o yaml > rev-pulsar-bookie-cm.yaml
Edit pulsar-bookie-cm.yml
and add following property:
diskUsageThreshold: "0.98"
And then applied changes:
fer@N315D154:~$ kubectl -n mdp-dgt-review-check-data-ocdup2 apply -f rev-pulsar-bookie-cm.yaml
For the bookie pods to get the change I deleted manually one by one the bookie replica pods. After that broker was able to start.
However this situation lasted not long as eventually two of the bookies started to occupy all the space.
I analyzed all the topics stats and actually some of them returned error:
root@rev-pulsar-toolset-0:/pulsar/bin# for i in $(./pulsar-admin namespaces list dbus); do topics=$(./pulsar-admin topics list $i); for topic in $topics; do echo $topic; ./pulsar-admin topics stats $topic | grep storageSize; done; done
"persistent://dbus/dgt-measure-point-15min-local/traffic-data-events"
"storageSize" : 43968,
"persistent://dbus/dgt-measure-point-15min-local/status"
HTTP 500 Server Error
Reason: HTTP 500 Server Error
"persistent://dbus/dgt-time-series-point/traffic-data-events"
"storageSize" : 2732639642,
"persistent://dbus/dgt-time-series-point/status"
HTTP 500 Server Error
Reason: HTTP 500 Server Error
"persistent://dbus/test-ltp-config/variables"
"storageSize" : 0,
Every bookie replica pod has a volume of 50 GiB for ledgers and 10 GiB for journals. The volumen being filled up was the one of the ledgers. As you can see only one topic has a considerable size (2.5 GiB), which does not make any sense.
There were no producers producing any more data at the moment of the error.
My suspicion is that the compaction process (enabled for all topics and set to 10 MB threshold) is the one filling up disk (probably with temporary data copy). It can be a problem as biggest topic does not have any repeating key (compaction does not make any sense in this case). I have observed logs like the following
11:19:53.242 [GarbageCollectorThread-11-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Disk almost full, suspend major compaction to slow down filling disk.
11:19:53.242 [GarbageCollectorThread-11-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Enter minor compaction, suspendMinor false
11:19:53.242 [GarbageCollectorThread-11-1] INFO org.apache.bookkeeper.bookie.GarbageCollectorThread - Do compaction to compact those files lower than 0.2
I have removed my namespace and redeployed two versions of the services:
Will check on Monday differences between both environments.
I encountered the same issue when I run
bin/bookkeeper shell bookiesanity
. I'm using the pulsar-2.10.0.
I had the same problem
@fmiguelez
I'm also trying to find the solution.
Could you please update anything of your finding?
What I have done is configuring these two setting in bookie's configurations.
@fmiguelez
In my case, What frankjkelly said works for me.
I think what he had done is to minimize Compaction triggering interval for more aggressive GC.
@moweonlee I have reviewed my notes of that day. The result was the same, if compaction was enabled at namespace level, bookies unreadable due to disk full.
In our systems we usually have compaction enabled globally but allow disabling it at topic level. We were surprised to check that such feature does not work anymore (I would swear to have seen it working in previous versions). We reported the issue in #15891, as such error seems to be related when trying to configure compaction at topic level.
The only solution we have found is to disable compaction at namespace level when creating it. In that case bookies disk usage does not increase uncontrollably as with compaction.
I would say that compaction is not a feature that works flawlessly in Pulsar, though on paper it sounds great to be able to work with both compacted and non-compacted topics at the same time.
Yes, you're right @fmiguelez.
Same issue was reproduced after couple of days and I solved my issue just by reducing retention and TTL finally.
But I'd rather swear that the issue would be reproduced with longer interval than before.
Thank you for sharing your information about compaction.
@moweonlee , could you please let me know what exact config/value did you update for retention and TTL to get this issue resolved.
@arpillai89
Here is the configuration that I used for my pulsar cluster.
I didn't think about how much storage volume is necessary to store all data incoming.
Now it is clear that my case was pretty simple.
So it will be good to see Grafana's storage volume chart first as long as Pulsar is deployed in K8s
./bin/pulsar-admin namespaces set-retention public/default --size 8G --time -1
./bin/pulsar-admin namespaces get-retention public/default
./bin/pulsar-admin namespaces set-message-ttl public/default --messageTTL 60
./bin/pulsar-admin namespaces get-message-ttl public/defaul
Pls run the command
bin/bookkeeper shell listbookies -rw -h
to get the list the ReadWrite Bookies. And check that all broker configs keep consistentmanagedLedgerDefaultEnsembleSize
,managedLedgerDefaultWriteQuorum
,managedLedgerDefaultAckQuorum
.Then check that ReadWrite Bookies number >=
managedLedgerDefaultEnsembleSize
>=managedLedgerDefaultWriteQuorum
>=managedLedgerDefaultAckQuorum
.
I deploy my pulsar demo with 1 zk, 1 bookie and 1 broker. When I create the topic the same error occurs(Not enough non-faulty bookies available). Then I checked the 3 options that all the default value is 2. When I change all the values to 1, and then restart the broker, the problem solved. Thanks.
BROKER: webSocketServiceEnabled: "true" # Not enough non-faulty bookies available
@eaba What does your comment mean?
Enable this? Disable this?
I have some problem with pulsar:latest docker image
apache-pulsar-2.8.0
three pulsar cluster
second pulsar messages:
16:33:15.772 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (135, 0): Bookie handle is not available 16:33:15.801 [BookKeeperClientWorker-OrderedExecutor-0-0] ERROR org.apache.bookkeeper.client.MetadataUpdateLoop - UpdateLoop(ledgerId=135,loopId=413ab672) Exception updating org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available at org.apache.bookkeeper.bookie.LocalBookieEnsemblePlacementPolicy.replaceBookie(LocalBookieEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:334) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1935) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:134) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:123) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1954) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1903) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:378) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.proto.BookieClientImpl$1.safeRun(BookieClientImpl.java:295) [org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) [org.apache.bookkeeper-bookkeeper-common-4.14.1.jar:4.14.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231] 16:33:15.802 [BookKeeperClientWorker-OrderedExecutor-0-0] WARN org.apache.bookkeeper.client.LedgerHandle - [EnsembleChange(ledger:135, change-id:0000000001)][attempt:1] Exception changing ensemble org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available at org.apache.bookkeeper.bookie.LocalBookieEnsemblePlacementPolicy.replaceBookie(LocalBookieEnsemblePlacementPolicy.java:89) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.BookieWatcherImpl.replaceBookie(BookieWatcherImpl.java:334) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.EnsembleUtils.replaceBookiesInEnsemble(EnsembleUtils.java:71) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.lambda$ensembleChangeLoop$2(LedgerHandle.java:1935) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.MetadataUpdateLoop.writeLoop(MetadataUpdateLoop.java:134) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.MetadataUpdateLoop.run(MetadataUpdateLoop.java:123) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.ensembleChangeLoop(LedgerHandle.java:1954) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.LedgerHandle.handleBookieFailure(LedgerHandle.java:1903) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.client.PendingAddOp.writeComplete(PendingAddOp.java:378) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.proto.BookieClientImpl$1.safeRun(BookieClientImpl.java:295) ~[org.apache.bookkeeper-bookkeeper-server-4.14.1.jar:4.14.1] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.14.1.jar:4.14.1] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_231] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_231] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.63.Final.jar:4.1.63.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_231]