OpenTSDB / asynchbase

A fully asynchronous, non-blocking, thread-safe, high-performance HBase client.
https://github.com/OpenTSDB/asynchbase
BSD 3-Clause "New" or "Revised" License
609 stars 303 forks source link

zookeeper session expire quickly? #177

Open macroxmu opened 7 years ago

macroxmu commented 7 years ago

My opentsdb version is 2.4, hbase version 1.0.2 with kerberos .

I config the opentsdb and asynchbase with offical documnet and reference the issue: https://github.com/OpenTSDB/asynchbase/issues/148 But I also cann't initially the opentsdb,the log is like this:

10:17:40.696 INFO [TSDMain.main] - Starting. 10:17:40.706 INFO [TSDMain.main] - net.opentsdb.tools 2.4.0RC1 built at revision (MODIFIED) 10:17:40.706 INFO [TSDMain.main] - Built on 2017/06/12 06:11:47 +0000 by root@centos.localhost:/root/rpmbuild/BUILD/opentsdb-2.4.0RC1 10:17:40.713 INFO [Config.loadConfig] - Successfully loaded configuration file: /etc/opentsdb/opentsdb.conf 10:17:40.894 INFO [Config.loadConfig] - Successfully loaded configuration file: /etc/opentsdb/opentsdb.conf 10:17:41.126 WARN [PluginLoader.loadPlugins] - Unable to locate any plugins of the type: net.opentsdb.query.filter.TagVFilter 10:17:41.152 INFO [ZooKeeper.logEnv] - Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 10:17:41.152 INFO [ZooKeeper.logEnv] - Client environment:host.name=linux108136 10:17:41.152 INFO [ZooKeeper.logEnv] - Client environment:java.version=1.8.0_72 10:17:41.152 INFO [ZooKeeper.logEnv] - Client environment:java.vendor=Oracle Corporation 10:17:41.152 INFO [ZooKeeper.logEnv] - Client environment:java.home=/opt/hadoop_client_c60u10/JDK/jdk/jre 10:17:41.153 INFO [ZooKeeper.logEnv] - Client environment:java.class.path=/usr/share/opentsdb/*.jar:/usr/share/opentsdb:/usr/share/opentsdb/bin:/usr/share/opentsdb/lib/asm-4.0.jar:/usr/share/opentsdb/lib/async-1.4.0.jar:/usr/share/opentsdb/lib/asynchbase-1.8.0.jar:/usr/share/opentsdb/lib/commons-jexl-2.1.1.jar:/usr/share/opentsdb/lib/commons-logging-1.1.1.jar:/usr/share/opentsdb/lib/commons-math3-3.4.1.jar:/usr/share/opentsdb/lib/guava-18.0.jar:/usr/share/opentsdb/lib/jackson-annotations-2.4.3.jar:/usr/share/opentsdb/lib/jackson-core-2.4.3.jar:/usr/share/opentsdb/lib/jackson-databind-2.4.3.jar:/usr/share/opentsdb/lib/javacc-6.1.2.jar:/usr/share/opentsdb/lib/jgrapht-core-0.9.1.jar:/usr/share/opentsdb/lib/kryo-2.21.1.jar:/usr/share/opentsdb/lib/log4j-over-slf4j-1.7.7.jar:/usr/share/opentsdb/lib/logback-classic-1.0.13.jar:/usr/share/opentsdb/lib/logback-core-1.0.13.jar:/usr/share/opentsdb/lib/minlog-1.2.jar:/usr/share/opentsdb/lib/netty-3.9.4.Final.jar:/usr/share/opentsdb/lib/protobuf-java-2.5.0.jar:/usr/share/opentsdb/lib/reflectasm-1.07-shaded.jar:/usr/share/opentsdb/lib/slf4j-api-1.7.7.jar:/usr/share/opentsdb/lib/tsdb-2.4.0RC1.jar:/usr/share/opentsdb/lib/zookeeper-3.4.6.jar:/etc/opentsdb 10:17:41.153 INFO [ZooKeeper.logEnv] - Client environment:java.library.path=/opt/hadoop_client_c60u10//JDK/jdk/lib:/opt/hadoop_client_c60u10//KrbClient/kerberos/lib:/opt/hadoop_client_c60u10//KrbClient/kerberos/lib:/opt/hadoop_client_c60u10//JDK/jdk/lib:/opt/huawei/Bigdata/OMA-V100R001C00.x86_64/lib:/opt/huawei/Bigdata/OMSV100R001C00x8664/gaussdb/lib::/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 10:17:41.153 INFO [ZooKeeper.logEnv] - Client environment:java.io.tmpdir=/tmp 10:17:41.153 INFO [ZooKeeper.logEnv] - Client environment:java.compiler= 10:17:41.153 INFO [ZooKeeper.logEnv] - Client environment:os.name=Linux 10:17:41.153 INFO [ZooKeeper.logEnv] - Client environment:os.arch=amd64 10:17:41.154 INFO [ZooKeeper.logEnv] - Client environment:os.version=2.6.32-431.el6.x86_64 10:17:41.154 INFO [ZooKeeper.logEnv] - Client environment:user.name=omm 10:17:41.155 INFO [ZooKeeper.logEnv] - Client environment:user.home=/home/omm 10:17:41.155 INFO [ZooKeeper.logEnv] - Client environment:user.dir=/usr/share/opentsdb/bin 10:17:41.156 INFO [ZooKeeper.] - Initiating client connection, connectString=linux108138:24002,linux108137:24002,linux108136:24002 sessionTimeout=5000 watcher=org.hbase.async.HBaseClient$ZKClient@2f8f5f62 10:17:41.174 INFO [HBaseClient.getDeferredRoot] - Need to find the -ROOT- region 10:17:41.180 INFO [ClientCnxn.logStartConnect] - Opening socket connection to server linux108137/128.64.108.137:24002 10:17:41.300 INFO [ClientCnxn.primeConnection] - Socket connection established to linux108137/128.64.108.137:24002, initiating session 10:17:41.311 INFO [ClientCnxn.onConnected] - Session establishment complete on server linux108137/128.64.108.137:24002, sessionid = 0x1600cac7ef469f4d, negotiated timeout = 5000 10:17:41.320 INFO [ClientCnxn.run] - Unable to read additional data from server sessionid 0x1600cac7ef469f4d, likely server has closed socket, closing socket connection and attempting reconnect 10:17:41.427 ERROR [HBaseClient.processResult] - Looks like our ZK session expired or is broken, rc=-4: CONNECTIONLOSS 10:17:41.612 INFO [ZooKeeper.close] - Session: 0x1600cac7ef469f4d closed 10:17:41.612 INFO [ZooKeeper.] - Initiating client connection, connectString=linux108138:24002,linux108137:24002,linux108136:24002 sessionTimeout=5000 watcher=org.hbase.async.HBaseClient$ZKClient@2f8f5f62 10:17:41.615 ERROR [HBaseClient.processResult] - Looks like our ZK session expired or is broken, rc=-4: CONNECTIONLOSS 10:17:41.615 INFO [ClientCnxn.logStartConnect] - Opening socket connection to server linux108136/128.64.108.136:24002 10:17:41.616 INFO [ClientCnxn.primeConnection] - Socket connection established to linux108136/128.64.108.136:24002, initiating session 10:17:41.620 INFO [ClientCnxn.onConnected] - Session establishment complete on server linux108136/128.64.108.136:24002, sessionid = 0x1500cac7b0b29e94, negotiated timeout = 5000 10:17:41.721 INFO [ZooKeeper.close] - Session: 0x1500cac7b0b29e94 closed 10:17:41.722 INFO [ZooKeeper.] - Initiating client connection, connectString=linux108138:24002,linux108137:24002,linux108136:24002 sessionTimeout=5000 watcher=org.hbase.async.HBaseClient$ZKClient@2f8f5f62 10:17:41.725 INFO [ClientCnxn.logStartConnect] - Opening socket connection to server linux108137/128.64.108.137:24002 10:17:41.726 INFO [ClientCnxn.primeConnection] - Socket connection established to linux108137/128.64.108.137:24002, initiating session 10:17:41.729 INFO [ClientCnxn.onConnected] - Session establishment complete on server linux108137/128.64.108.137:24002, sessionid = 0x1600cac7ef469f4e, negotiated timeout = 5000 10:17:41.831 INFO [ZooKeeper.close] - Session: 0x1600cac7ef469f4e closed 10:17:41.832 WARN [HBaseClient.process] - No longer connected to ZooKeeper, event=WatchedEvent state:Disconnected type:None path:null 10:17:41.832 INFO [ZooKeeper.] - Initiating client connection, connectString=linux108138:24002,linux108137:24002,linux108136:24002 sessionTimeout=5000 watcher=org.hbase.async.HBaseClient$ZKClient@2f8f5f62 10:17:41.834 INFO [ClientCnxn.run] - EventThread shut down 10:17:41.834 INFO [ClientCnxn.logStartConnect] - Opening socket connection to server linux108137/128.64.108.137:24002 10:17:41.834 INFO [ClientCnxn.run] - EventThread shut down 10:17:41.834 INFO [ClientCnxn.run] - EventThread shut down 10:17:41.835 INFO [ClientCnxn.primeConnection] - Socket connection established to linux108137/128.64.108.137:24002, initiating session 10:17:41.838 INFO [ClientCnxn.onConnected] - Session establishment complete on server linux108137/128.64.108.137:24002, sessionid = 0x1600cac7ef469f50, negotiated timeout = 5000 10:17:41.850 INFO [ClientCnxn.run] - Unable to read additional data from server sessionid 0x1600cac7ef469f50, likely server has closed socket, closing socket connection and attempting reconnect

I config the opentsdb like this: modify opentsdb.conf to add kerberos authentication : hbase.zookeeper.quorum=linux108138:24002,linux108137:24002,linux108136:24002 hbase.zookeeper.znode.parent=/hbase hbase.security.auth.enable=true hbase.security.authentication=kerberos hbase.security.simple.username=user_for_training hbase.kerberos.regionserver.principal=hbase/hadoop.hadoop.com@HADOOP.COM hbase.sasl.clientconfig=Client

modify /usr/share/opentsdb/bin/tsdb to add JVMARGS: JVMARGS=${JVMARGS-'-enableassertions -enablesystemassertions'} JVMARGS="${JVMARGS} -Dhbase.kerberos.regionserver.principal=hbase/hadoop.hadoop.com@HADOOP.COM" JVMARGS="${JVMARGS} -Dhbase.sasl.clientconfig=Client" JVMARGS="${JVMARGS} -Djava.security.login.auth.config=/usr/share/opentsdb/bin/asynchbase.jass" JVMARGS="${JVMARGS} -Dhbase.security.authentication=kerberos" JVMARGS="${JVMARGS} -Dzookeeper.sasl.client=false"

the jass file: asynchbase.jass: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true useTicketCache=true keyTab=/opt/hadoop_keytab/user_for_training/user.keytab principal="hbase/hadoop.hadoop.com@HADOOP.COM"; };

manolama commented 7 years ago

Hmm, could you look at the ZK logs to see if it gives any good details about what happened?

macroxmu commented 7 years ago

zookeeper's log has error message as below: image it look like authentication is not success,is there a configuration guide of kerberos login?

shitWang commented 6 years ago

Hi have you solved this promblem?

DilipMalipeddi commented 6 years ago

Getting similar error 2017-10-19 14:58:56 [AsyncHBase I/O Worker #4] ERROR org.hbase.async.RegionClient - Unexpected exception from downstream on [id: 0x16055d89, /10.108.232.163:59245 => /10.80.15.226:16020] java.io.IOException: An established connection was aborted by the software in your host machine at sun.nio.ch.SocketDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(Unknown Source) at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source) at sun.nio.ch.IOUtil.read(Unknown Source) at sun.nio.ch.SocketChannelImpl.read(Unknown Source) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source)

DilipMalipeddi commented 6 years ago

@manolama Below are the zookeeper logs

2017-10-19 10:03:26,433 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /10.80.15.222:36604 2017-10-19 10:03:26,433 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /10.80.15.222:36604 2017-10-19 10:03:26,448 - INFO [CommitProcessor:2:ZooKeeperServer@617] - Established session 0x25ecc9322b18541 with negotiated timeout 30000 for client /10.80.15.222:36604 2017-10-19 10:03:26,451 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@118] - Successfully authenticated client: authenticationID=storm@xxxx.com; authorizationID=storm@xxxx.com. 2017-10-19 10:03:26,452 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:SaslServerCallbackHandler@134] - Setting authorizedID: storm 2017-10-19 10:03:26,452 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@964] - adding SASL authorization for authorizationID: storm 2017-10-19 10:03:26,486 - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1008] - Closed socket connection for client /10.80.15.22

manolama commented 6 years ago

OK so it does look like kerberos with ZK may be the issue. One thing you can try, if your ZK allows it, is to add -Dzookeeper.sasl.client=false to your TSD JVM flags and see if that lets you proceed.