apache / bookkeeper

Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
https://bookkeeper.apache.org/
Apache License 2.0
1.91k stars 904 forks source link

Recovery log is missing #4105

Open Meikelrizkyhartawan opened 1 year ago

Meikelrizkyhartawan commented 1 year ago

i'm using 3 nodes of bookkeeper and then suddenly the error occure show recovery log is missing, how to trace the problem , how to solve this issue ?

java.io.IOException: Recovery log 1693588482171 is missing at org.apache.bookkeeper.bookie.Bookie.replay(Bookie.java:982) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3] at org.apache.bookkeeper.bookie.Bookie.readJournal(Bookie.java:961) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3] at org.apache.bookkeeper.bookie.Bookie.start(Bookie.java:1015) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3] at org.apache.bookkeeper.proto.BookieServer.start(BookieServer.java:156) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3] at org.apache.bookkeeper.server.service.BookieService.doStart(BookieService.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3] at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:83) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3.jar:4.14.3] at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$4(LifecycleComponentStack.java:144) ~[org.apache.bookkeeper-bookkeeper-common-4.14.3.jar:4.14.3] 11:43:35.444 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - An exception was thrown while closing send thread for session 0x2001f9bccf10005. org.apache.zookeeper.ClientCnxn$EndOfStreamException: Unable to read additional data from server sessionid 0x2001f9bccf10005, likely server has closed socket at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:77) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3] at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1290) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]

11:15:29.609 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error) 11:15:59.638 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - Client session timed out, have not heard from server in 31244ms for session id 0x200a10385d50003 11:15:59.638 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x200a10385d50003 for sever milvus-ground-pulsar-zookeeper/10.244.11.21:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. org.apache.zookeeper.ClientCnxn$SessionTimeoutException: Client session timed out, have not heard from server in 31244ms for session id 0x200a10385d50003 at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1258) [org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3] at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:406) [com.google.guava-guava-30.1-jre.jar:?] at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:144) [org.apache.bookkeeper-bookkeeper-common-4.14.3.jar:4.14.3] at org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:85) [org.apache.bookkeeper-bookkeeper-common-4.14.3.jar:4.14.3] at org.apache.bookkeeper.server.Main.doMain(Main.java:234) [org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3] at org.apache.bookkeeper.server.Main.main(Main.java:208) [org.apache.bookkeeper-bookkeeper-server-4.14.3.jar:4.14.3]

11:16:01.383 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server milvus-ground-pulsar-zookeeper/10.244.10.102:2181. 11:16:01.383 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - SASL config status: Will not attempt to authenticate using SASL (unknown error) 11:16:01.384 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established, initiating session, client: /10.244.10.47:39714, server: milvus-ground-pulsar-zookeeper/10.244.10.102:2181 11:16:01.386 [main-EventThread] ERROR org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client connection to the ZooKeeper server has expired! 11:16:01.386 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - Unable to reconnect to ZooKeeper service, session 0x200a10385d50003 has expired 11:16:01.386 [main-EventThread] INFO org.apache.bookkeeper.zookeeper.ZooKeeperClient - ZooKeeper session 200a10385d50003 is expired from milvus-ground-pulsar-zookeeper:2181. 11:16:01.387 [main-SendThread(milvus-ground-pulsar-zookeeper:2181)] WARN org.apache.zookeeper.ClientCnxn - Session 0x200a10385d50003 for sever milvus-ground-pulsar-zookeeper/10.244.10.102:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException. org.apache.zookeeper.ClientCnxn$SessionExpiredException: Unable to reconnect to ZooKeeper service, session 0x200a10385d50003 has expired at org.apache.zookeeper.ClientCnxn$SendThread.onConnected(ClientCnxn.java:1434) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3] at org.apache.zookeeper.ClientCnxnSocket.readConnectResult(ClientCnxnSocket.java:154) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3] at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:86) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3] at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350) ~[org.apache.zookeeper-zookeeper-3.6.3.jar:3.6.3]

hangc0276 commented 11 months ago

The reason it that the journal file is missing and you can do the following operation to reply all the journal files instead of reply by specific posision.

Kaiwei-Liu commented 2 months ago

I hava the same problem, I changed the file name,but there is no effect