apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.24k stars 3.59k forks source link

Still having issues of Failed to restore rockdb #6894

Closed bback99 closed 1 year ago

bback99 commented 4 years ago

Describe the bug we have accidentally addressed "Still having issues of Failed to restore rockdb" when we are running as standalone mode. and didn't changed any configurations for bookkeeper.

might related to this https://github.com/apache/pulsar/issues/5668

with -nss, looks fine now.

so, we should run with -nss until having some changes?

To Reproduce Logs 13:29:28.310 [io-write-scheduler-OrderedScheduler-0-0] WARN org.apache.bookkeeper.stream.storage.impl.sc.ZkStorageContainerManager - Failed to start storage container (0) java.util.concurrent.CompletionException: org.apache.bookkeeper.statelib.api.exceptions.StateStoreException: Failed to restore rocksdb 000000000000000000/000000000000000000/000000000000000000 at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.uniCompose(CompletableFuture.java:957) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniCompose.tryFire(CompletableFuture.java:940) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_242] at org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:474) ~[org.apache.bookkeeper-statelib-4.10.0.jar:4.10.0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242] at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) [com.google.guava-guava-25.1-jre.jar:?] at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) [com.google.guava-guava-25.1-jre.jar:?] at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) [com.google.guava-guava-25.1-jre.jar:?] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_242] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_242] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_242] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_242] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_242] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_242] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [io.netty-netty-common-4.1.45.Final.jar:4.1.45.Final] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_242] Caused by: org.apache.bookkeeper.statelib.api.exceptions.StateStoreException: Failed to restore rocksdb 000000000000000000/000000000000000000/000000000000000000 at org.apache.bookkeeper.statelib.impl.rocksdb.checkpoint.RocksCheckpointer.restore(RocksCheckpointer.java:84) ~[org.apache.bookkeeper-statelib-4.10.0.jar:4.10.0] at org.apache.bookkeeper.statelib.impl.kv.RocksdbKVStore.loadRocksdbFromCheckpointStore(RocksdbKVStore.java:161) ~[org.apache.bookkeeper-statelib-4.10.0.jar:4.10.0] at org.apache.bookkeeper.statelib.impl.kv.RocksdbKVStore.init(RocksdbKVStore.java:223) ~[org.apache.bookkeeper-statelib-4.10.0.jar:4.10.0] at org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$initializeLocalStore$5(AbstractStateStoreWithJournal.java:202) ~[org.apache.bookkeeper-statelib-4.10.0.jar:4.10.0] at org.apache.bookkeeper.statelib.impl.journal.AbstractStateStoreWithJournal.lambda$executeIO$16(AbstractStateStoreWithJournal.java:471) ~[org.apache.bookkeeper-statelib-4.10.0.jar:4.10.0] ... 12 more Caused by: org.apache.distributedlog.exceptions.LogEmptyException: Log 000000000000000000/000000000000000000/000000000000000000/checkpoints/e6ac48ab-1045-472e-89d0-95686a71ee8d/metadata: has no records at org.apache.distributedlog.BKLogHandler$2$1.onSuccess(BKLogHandler.java:245) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.BKLogHandler$2$1.onSuccess(BKLogHandler.java:239) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:42) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_242] at org.apache.distributedlog.BKLogHandler.readLogSegmentsFromStore(BKLogHandler.java:636) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.BKLogHandler$6.onSuccess(BKLogHandler.java:600) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.distributedlog.BKLogHandler$6.onSuccess(BKLogHandler.java:592) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:42) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at org.apache.bookkeeper.common.concurrent.FutureEventListener.accept(FutureEventListener.java:26) ~[org.apache.bookkeeper-bookkeeper-common-4.10.0.jar:4.10.0] at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:774) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:750) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_242] at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1975) ~[?:1.8.0_242] at org.apache.distributedlog.impl.ZKLogSegmentMetadataStore.processResult(ZKLogSegmentMetadataStore.java:377) ~[org.apache.distributedlog-distributedlog-core-4.10.0.jar:4.10.0] at org.apache.bookkeeper.zookeeper.ZooKeeperClient$25$1.processResult(ZooKeeperClient.java:1174) ~[org.apache.bookkeeper-bookkeeper-server-4.10.0.jar:4.10.0] at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:627) ~[org.apache.pulsar-pulsar-zookeeper-2.5.1.jar:2.5.1] at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510) ~[org.apache.pulsar-pulsar-zookeeper-2.5.1.jar:2.5.1]

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

Additional context this issue is on Pulsar 2.5.1

nicolo-paganin commented 4 years ago

I still have this error in pulsar 2.6.0, any news?

devinbost commented 4 years ago

I'm also seeing it here on 2.6.1: https://github.com/apache/pulsar/issues/8184

darkredz commented 3 years ago

I am also seeing this on 2.6.2

narzach commented 3 years ago

Any updates on this? I see this failure very frequently when running a standalone Pulsar cluster locally in docker.

yitian108 commented 3 years ago

Today, the issue still persist, v2.7

ta1meng commented 3 years ago

Still happening in v2.7.1. Both my co-worker and I have been running into this rockdb error intermittently when running Pulsar standalone.

Can someone explain what the argumennt -nss (--no-stream-storage) does?

The command line help states:

    -nss, --no-stream-storage
      Disable stream storage
      Default: false

What functionality is disabled when we disable stream storage?

ta1meng commented 3 years ago

Also today, I learned from the Pulsar monthly update hosted by StreamNative that Pulsar 2.8 will have some rockdb fixes, so maybe this problem would be resolved in Pulsar 2.8.

qq459673705 commented 3 years ago

This problem is caused by https://github.com/apache/bookkeeper/issues/2357 First the pulsar server log showed me that : Not enough non-faulty bookies available. Then a minutes later , the log showed me this: io.netty.util.internal.OutOfDirectMemoryError: .... After 5 times retry,Pulsar server is shutdown. When i found this, i tried to restart the pulsar server, but the log tell me that: Failed to restore rocksdb...

tisonkun commented 1 year ago

Closed as stale. Please create a new issue if it's still relevant to the maintained versions.