Closed wzdushu closed 6 years ago
@nanguawu :
where did you run bookkeeper sanity
? was it in the bookie machine?
can you run bookkeeper shell listbookies -rw
to get the list of bookies alive in the cluster?
Closing ledger 11 due to NotEnoughBookiesException: Not enough non-faulty bookies available
This error message indicates that there is no bookies alive. I am suspecting that bookies advertised a non-reachable address.
The zookeeper that comes with pulsar is normal, always suggesting that bookie does not exist. cluster node: server057.opsadmin.com
[root@server057 pulsar]# ./bin/bookkeeper shell listbookies -rw
11:33:40.444 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
11:33:40.444 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.io.tmpdir=/tmp
11:33:40.444 [main] INFO org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=
yeah. it seems that bookie doesn't advertised itself to zookeeper. do you mind sharing the logs of bookies? @nanguawu
Bookie server startup log has been added to the attachment
@nanguawu it seems that the bookie doesn't finish starting up. is your bookie still running? can you take a jstack trace? run jstack <pid>
?
But the bookie process is already up,the attachment is the result of the implementation of "jstack
[root@server057 ~]# ps -ef |grep bookie root 20119 1 0 Oct30 ? 00:01:19 /opt/jdk1.8.0_191/bin/java -cp /ssd/pulsar/conf:::/ssd/pulsar/lib/*: -Dlog4j.configurationFile=log4j2.yaml -Djute.maxbuffer=10485760 -Djava.net.preferIPv4Stack=true -Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024 -Dpulsar.log.appender=RollingFile -Dpulsar.log.dir=/ssd/pulsar/logs -Dpulsar.log.level=info -Dpulsar.routing.appender.default=Console -Dpulsar.functions.process.container.log.dir=/ssd/pulsar/logs -Dpulsar.functions.java.instance.jar=/ssd/pulsar/instances/java-instance.jar -Dpulsar.functions.python.instance.file=/ssd/pulsar/instances/python-instance/python_instance_main.py -Dpulsar.log.file=pulsar-bookie-server057.opsadmin.com.log org.apache.bookkeeper.proto.BookieServer --conf /ssd/pulsar/conf/bookkeeper.conf [root@server057 ~]# jstack 20119 > /tmp/20119.jstack.log 20119.jstack.log
@nanguawu :
But the bookie process is already up
the process is running, but
from the jstack, I didn't seem any journal thread BookieJournal
seems it doesn't complete startup.
did you have find anything more in the logs?
Can you restart one bookie and share me with the new log?
https://github.com/apache/pulsar/files/2529415/pulsar-bookie-server057.opsadmin.cn.log
There is a log i restarted . I have provided before. Below is another log file
Exception in thread "main" java.lang.RuntimeException: java.net.BindException: Address already in use at org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider.start(PrometheusMetricsProvider.java:126) at org.apache.bookkeeper.server.service.StatsProviderService.doStart(StatsProviderService.java:51) at org.apache.bookkeeper.common.component.AbstractLifecycleComponent.start(AbstractLifecycleComponent.java:78) at org.apache.bookkeeper.common.component.LifecycleComponentStack.lambda$start$2(LifecycleComponentStack.java:113) at com.google.common.collect.ImmutableList.forEach(ImmutableList.java:408) at org.apache.bookkeeper.common.component.LifecycleComponentStack.start(LifecycleComponentStack.java:113) at org.apache.bookkeeper.common.component.ComponentStarter.startComponent(ComponentStarter.java:80) at org.apache.bookkeeper.server.Main.doMain(Main.java:227) at org.apache.bookkeeper.server.Main.main(Main.java:201) at org.apache.bookkeeper.proto.BookieServer.main(BookieServer.java:280) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.eclipse.jetty.server.ServerConnector.open(ServerConnector.java:317) at org.eclipse.jetty.server.AbstractNetworkConnector.doStart(AbstractNetworkConnector.java:80) at org.eclipse.jetty.server.ServerConnector.doStart(ServerConnector.java:235) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.eclipse.jetty.server.Server.doStart(Server.java:390) at org.eclipse.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:68) at org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider.start(PrometheusMetricsProvider.java:123) ... 9 more [root@server057 pulsar]# cat data/bookkeeper/ledgers/current/VERSION 4 bookieHost: "server057.opsadmin.com:3181" journalDir: "data/bookkeeper/journal" ledgerDirs: "1\tdata/bookkeeper/ledgers" instanceId: "f10937c4-71dc-4512-bfe5-067fa7d43bf8" [root@server057 pulsar]# cat data/bookkeeper/journal/current/VERSION 4 bookieHost: "server057.opsadmin.com:3181" journalDir: "data/bookkeeper/journal" ledgerDirs: "1\tdata/bookkeeper/ledgers" instanceId: "f10937c4-71dc-4512-bfe5-067fa7d43bf8" [root@server057 pulsar]# egrep -v '^#|^$' conf/bookkeeper.conf bookiePort=3181 journalDirectory=data/bookkeeper/journal minUsableSizeForIndexFileCreation=1073741824 advertisedAddress=server057.opsadmin.com allowLoopback=false bookieDeathWatchInterval=1000 flushInterval=60000 useHostNameAsBookieID=false gcWaitTime=900000 gcOverreplicatedLedgerWaitTime=86400000 numAddWorkerThreads=0 numReadWorkerThreads=8 numHighPriorityWorkerThreads=8 maxPendingReadRequestsPerThread=2500 maxPendingAddRequestsPerThread=10000 auditorPeriodicBookieCheckInterval=86400 rereplicationEntryBatchSize=5000 openLedgerRereplicationGracePeriod=30 autoRecoveryDaemonEnabled=true lostBookieRecoveryDelay=0 serverTcpNoDelay=true journalMaxSizeMB=2048 journalMaxBackups=5 journalPreAllocSizeMB=16 journalWriteBufferSizeKB=64 journalRemoveFromPageCache=true journalSyncData=true journalAdaptiveGroupWrites=true journalMaxGroupWaitMSec=1 journalBufferedWritesThreshold=524288 numJournalCallbackThreads=8 journalAlignmentSize=4096 journalFlushWhenQueueEmpty=false ledgerStorageClass=org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage ledgerDirectories=data/bookkeeper/ledgers auditorPeriodicCheckInterval=604800 openFileLimit=0 pageLimit=0 ledgerManagerFactoryClass=org.apache.bookkeeper.meta.HierarchicalLedgerManagerFactory zkLedgersRootPath=/ledgers logSizeLimit=1073741824 entryLogFilePreallocationEnabled=true readBufferSizeBytes=4096 writeBufferSizeBytes=65536 compactionRate=1000 minorCompactionThreshold=0.2 minorCompactionInterval=3600 compactionMaxOutstandingRequests=100000 majorCompactionThreshold=0.5 majorCompactionInterval=86400 isThrottleByBytes=false compactionRateByEntries=1000 compactionRateByBytes=1000000 statsProviderClass=org.apache.bookkeeper.stats.prometheus.PrometheusMetricsProvider prometheusStatsHttpPort=8000 readOnlyModeEnabled=true diskUsageThreshold=0.95 diskCheckInterval=10000 zkServers=server057.opsadmin.com:2181,server058.opsadmin.com:2181,server059.opsadmin.com:2181 zkTimeout=30000 zkEnableSecurity=false dbStorage_writeCacheMaxSizeMb=512 dbStorage_readAheadCacheMaxSizeMb=256 dbStorage_readAheadCacheBatchSize=1000 dbStorage_rocksDB_blockCacheSize=268435456 dbStorage_rocksDB_writeBufferSizeMB=64 dbStorage_rocksDB_sstSizeInMB=64 dbStorage_rocksDB_blockSize=65536 dbStorage_rocksDB_bloomFilterBitsPerKey=10 dbStorage_rocksDB_numLevels=-1 dbStorage_rocksDB_numFilesInLevel0=4 dbStorage_rocksDB_maxSizeInLevel1MB=256
This is the data that the zookeeper got bookies.
ode : /bookies data : b'{}' stat : ZnodeStat(czxid=21474836493, mzxid=21474836493, ctime=1540801542043, mtime=1540801542043, version=0, cversion=0, aversion=0, ephemeralOwn er=0, dataLength=2, numChildren=0, pzxid=21474836493)child : []
@nanguawu
cat /ssd/pulsar/logs/pulsar-bookie-server057.opsadmin.com.out
Exception in thread "main" java.lang.RuntimeException: java.net.BindException: Address already in use at
it seems that bookie fails to start due to it can't bind to prometheus port. are you running other components in the same port 8000?
@nanguawu
can you run lsof -i :8000
to see what process is using 8000?
8000 this port should be the zookeeper port that comes with pulsar。 I seem to know where the prolem is ,Thank you very much! [root@server057 pulsar]# lsof -i:8000 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME java 33839 root 215u IPv4 163379008 0t0 TCP :irdmi (LISTEN) [root@server057 pulsar]# ps -ef |grep 33839 root 2799 1152 0 09:55 pts/0 00:00:00 grep --color=auto 33839 root 33839 1 0 Oct29 ? 00:07:12 /opt/jdk1.8.0_191/bin/java -Dzookeeper.4lw.commands.whitelist= -cp /ssd/pulsar/conf:::/ssd/pulsar/lib/*: -Dlog4j.configurationFile=log4j2.yaml -Djute.maxbuffer=10485760 -Djava.net.preferIPv4Stack=true -Xms2g -Xmx2g -XX:MaxDirectMemorySize=4g -XX:+UseG1GC -XX:MaxGCPauseMillis=10 -XX:+ParallelRefProcEnabled -XX:+UnlockExperimentalVMOptions -XX:+AggressiveOpts -XX:+DoEscapeAnalysis -XX:ParallelGCThreads=32 -XX:ConcGCThreads=32 -XX:G1NewSizePercent=50 -XX:+DisableExplicitGC -XX:-ResizePLAB -Dio.netty.leakDetectionLevel=disabled -Dio.netty.recycler.maxCapacity.default=1000 -Dio.netty.recycler.linkCapacity=1024 -Dpulsar.log.appender=RollingFile -Dpulsar.log.dir=/ssd/pulsar/logs -Dpulsar.log.level=info -Dpulsar.routing.appender.default=Console -Dpulsar.functions.process.container.log.dir=/ssd/pulsar/logs -Dpulsar.functions.java.instance.jar=/ssd/pulsar/instances/java-instance.jar -Dpulsar.functions.python.instance.file=/ssd/pulsar/instances/python-instance/python_instance_main.py -Dpulsar.log.file=pulsar-zookeeper-server057.opsadmin.com.log org.apache.pulsar.zookeeper.ZooKeeperStarter /ssd/pulsar/conf/zookeeper.conf
Solution: modify the monitoring port opened by bookeeper to prometheus。
[root@server057 pulsar]#sed -i 's/prometheusStatsHttpPort\=8000/prometheusStatsHttpPort\=8900/' conf/bookkeeper.conf
[root@server057 pulsar]#./bin/pulsar-daemon stop bookie
[root@server057 pulsar]#./bin/pulsar-daemon start bookie
[root@server057 pulsar]#/ssd/pulsar/bin/bookkeeper shell bookiesanity
10:04:28.128 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x3002bd8b1840022 closed
10:04:28.128 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x3002bd8b1840022
10:04:28.128 [main] INFO org.apache.bookkeeper.bookie.BookieShell - Bookie sanity test succeeded
@nanguawu that's awesome. glad that you resolved the problem :) feel free to ping us if you have any other issues.
Hi: I am currently deploying a multi-cluster on bare metal. Reference example:https://pulsar.apache.org/docs/en/deploy-bare-metal-multi-cluster/ my pulsar version: 2.2.0 os release: CentOS Linux release 7.5.1804 (Core) my cluster node: server057.opsadmin.com server058.opsadmin.com server059.opsadmin.com
Running the bookiesanity command for: /ssd/pulsar/bin/bookkeeper shell bookiesanity The following exception occurred: 18:06:38.861 [main-EventThread] INFO org.apache.bookkeeper.zookeeper.ZooKeeperWatcherBase - ZooKeeper client is connected now. 18:06:39.178 [main] INFO org.apache.bookkeeper.client.BookKeeper - Weighted ledger placement is not enabled 18:06:39.396 [main-EventThread] INFO org.apache.bookkeeper.client.LedgerCreateOp - Ensemble: [server057.opsadmin:3181] for ledger: 11 18:06:39.397 [main] INFO org.apache.bookkeeper.bookie.BookieShell - Created ledger 11 18:06:39.796 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: [id: 0xc246b03c, L:/172.20.10.57:41056 - R:server057.opsadmin/172.20.10.57:3181] 18:06:39.810 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Successfully connected to bookie: server057.opsadmin:3181 18:06:39.837 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - connection [id: 0xc246b03c, L:/172.20.10.57:41056 - R:server057.opsadmin/172.20.10.57:3181] authenticated as BookKeeperPrincipal{ANONYMOUS} 18:06:39.870 [bookkeeper-io-3-1] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Disconnected from bookie channel [id: 0xc246b03c, L:/172.20.10.57:41056 ! R:server057.opsadmin/172.20.10.57:3181] 18:06:39.881 [BookKeeperClientWorker-OrderedExecutor-11-0] WARN org.apache.bookkeeper.client.PendingAddOp - Failed to write entry (11, 0): Bookie handle is not available 18:06:39.903 [BookKeeperClientWorker-OrderedExecutor-11-0] ERROR org.apache.bookkeeper.client.LedgerHandle - Could not get additional bookie to remake ensemble, closing ledger: 11 18:06:39.903 [BookKeeperClientWorker-OrderedExecutor-11-0] ERROR org.apache.bookkeeper.client.LedgerHandle - Closing ledger 11 due to NotEnoughBookiesException: Not enough non-faulty bookies available 18:06:39.904 [BookKeeperClientWorker-OrderedExecutor-11-0] ERROR org.apache.bookkeeper.client.PendingAddOp - Write of ledger entry to quorum failed: L11 E0 18:06:39.905 [main] WARN org.apache.bookkeeper.bookie.BookieShell - Error in bookie sanity test org.apache.bookkeeper.client.BKException$BKNotEnoughBookiesException: Not enough non-faulty bookies available at org.apache.bookkeeper.client.SyncCallbackUtils.finish(SyncCallbackUtils.java:83) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2] at org.apache.bookkeeper.client.SyncCallbackUtils$SyncAddCallback.addComplete(SyncCallbackUtils.java:251) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2] at org.apache.bookkeeper.client.AsyncCallback$AddCallback.addCompleteWithLatency(AsyncCallback.java:91) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2] at org.apache.bookkeeper.client.PendingAddOp.submitCallback(PendingAddOp.java:361) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2] at org.apache.bookkeeper.client.LedgerHandle.errorOutPendingAdds(LedgerHandle.java:1543) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2] at org.apache.bookkeeper.client.LedgerHandle$5.safeRun(LedgerHandle.java:520) ~[org.apache.bookkeeper-bookkeeper-server-4.7.2.jar:4.7.2] at org.apache.bookkeeper.common.util.SafeRunnable.run(SafeRunnable.java:36) ~[org.apache.bookkeeper-bookkeeper-common-4.7.2.jar:4.7.2] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_191] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_191] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-all-4.1.22.Final.jar:4.1.22.Final] at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_191] 18:06:39.926 [main] INFO org.apache.bookkeeper.bookie.BookieShell - Deleted ledger 11 18:06:39.926 [main] INFO org.apache.bookkeeper.proto.PerChannelBookieClient - Closing the per channel bookie client for server057.opsadmin.com:3181 18:06:39.961 [main] INFO org.apache.zookeeper.ZooKeeper - Session: 0x3002bd8b1840012 closed 18:06:39.961 [main-EventThread] INFO org.apache.zookeeper.ClientCnxn - EventThread shut down for session: 0x3002bd8b1840012