afredlyj / mynote

idea and note
1 stars 0 forks source link

zookeeper #4

Open afredlyj opened 8 years ago

afredlyj commented 8 years ago

Zookeeper 常用命令

Zookeeper中Znode节点结构类似Linux的目录结构,区别在于,Znode不仅有子节点,该节点本身还包括数据。

服务相关命令

  1. 启动ZK服务: sh bin/zkServer.sh start
  2. 查看ZK服务状态: sh bin/zkServer.sh status 可以通过该命令查看节点在集群中的角色
  3. 停止ZK服务: sh bin/zkServer.sh stop
  4. 重启ZK服务: sh bin/zkServer.sh restart

    四字命令

ZooKeeper 通过四字命令,来获取服务的当前状态及相关信息。用户可以通过 telnet 或 nc 向 ZooKeeper 提交相应的命令。

命令 说明
conf 输出Zookeeper服务器运行时的基本配置信息
cons 输出当前Zookeeper服务器上所有客户端连接的详细信息
crst 重置所有客户端的连接统计信息
dump 输出当前集群的所有会话信息
envi 输出当前Zookeeper所在服务器的运行时环境
ruok 判断当前服务器是否正在运行,只是表示端口可用,不能保证服务可用
stat 获取服务器运行时的状态信息
srvr 与stat类似,但不会输出客户端连接情况
srst 重置所有服务器的统计信息
wchs 当前服务器上管理的Watcher概要信息
wchc 当前服务器上管理的Watcher详细信息
wchp 与wchc类似,不同之处在于,wchp以节点为单位分组
mntr 服务器统计信息详情

比如查看服务器运行的状态信息:

[root@localhost log]# echo stat| nc 127.0.1.1 2181
Zookeeper version: 3.4.6-1569965, built on 02/20/2014 09:09 GMT
Clients:
 /xxx.xxx.206.99:56212[1](queued=0,recved=2292,sent=2292)
 /xxx.xxx.30.146:46780[1](queued=0,recved=2079,sent=2079)
 /xx.xxx.59.186:44670[1](queued=0,recved=2274,sent=2274)
 /xx.xxx.21.249:54079[1](queued=0,recved=4001,sent=4001)

Latency min/avg/max: 0/0/55
Received: 10450020
Sent: 10453857
Connections: 19
Outstanding: 0
Zxid: 0x400022c2d
Mode: follower
Node count: 727
afredlyj commented 8 years ago

在CuratorListener中操作不当导致CuratorConnectionLossException,收取watcher通知的流程如下:

  1. Event线程处理Event;
  2. 调用ConnectionState.process方法;
  3. process方法会调用注册的CuratorListener,也就是业务实现的eventReceived方法;
  4. 该方法完成之后,回到ConnectionState.process:
    public void process(WatchedEvent event)
    {
        if ( LOG_EVENTS )
        {
            log.debug("ConnectState watcher: " + event);
        }

        for ( Watcher parentWatcher : parentWatchers )
        {
            TimeTrace timeTrace = new TimeTrace("connection-state-parent-process", tracer.get());
            // 调用CuratorFrameworkImpl$1.process(CuratorFrameworkImpl.java:121)
            parentWatcher.process(event);
            timeTrace.commit();
        }

        boolean wasConnected = isConnected.get();
        boolean newIsConnected = wasConnected;
        if ( event.getType() == Watcher.Event.EventType.None )
        {
            newIsConnected = checkState(event.getState(), wasConnected);
        }

        if ( newIsConnected != wasConnected )
        {
            isConnected.set(newIsConnected);
            connectionStartMs = System.currentTimeMillis();
        }
    }

完成watcher回调之后,才会设置isConnected的值。

如果在自定义的eventReceived方法中调用CuratorFramework请求zk服务器,就会出现上述异常,原因在于RetryLoop的重试机制:

    public static<T> T      callWithRetry(CuratorZookeeperClient client, Callable<T> proc) throws Exception
    {
        T               result = null;
        RetryLoop       retryLoop = client.newRetryLoop();
        while ( retryLoop.shouldContinue() )
        {
            try
            {
                client.internalBlockUntilConnectedOrTimedOut();

                result = proc.call();
                retryLoop.markComplete();
            }
            catch ( Exception e )
            {
                retryLoop.takeException(e);
            }
        }
        return result;
    }

client.internalBlockUntilConnectedOrTimedOut()从字面上即可理解其含义,上面这段代码会不停重试,直到超时或者isConnected变量值为true。问题就出在这里,一方面,Event线程尝试连接zk服务器,但是连接状态又得等待当前线程结束eventReceived方法,从而导致超时。

附上一段超时的异常栈:

[ERROR] 2016-04-29 16:15:52 [main-EventThread][ConnectionState.java:201] - Connection timed out for connection string (172.17.103.111:2181) and timeout (15000) / elapsed (59338)
org.apache.curator.CuratorConnectionLossException: KeeperErrorCode = ConnectionLoss
    at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:198) [curator-client-2.6.0.jar:na]
    at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88) [curator-client-2.6.0.jar:na]
    at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115) [curator-client-2.6.0.jar:na]
    at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107) [curator-client-2.6.0.jar:na]
    at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36) [curator-framework-2.6.0.jar:na]
    at com.usercenter.config.ZKConfigWatcher$1.eventReceived(ZKConfigWatcher.java:80) [classes/:na]
    at org.apache.curator.framework.imps.CuratorFrameworkImpl$8.apply(CuratorFrameworkImpl.java:844) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.imps.CuratorFrameworkImpl$8.apply(CuratorFrameworkImpl.java:837) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.6.0.jar:na]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-17.0.jar:na]
    at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.imps.CuratorFrameworkImpl.processEvent(CuratorFrameworkImpl.java:836) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$000(CuratorFrameworkImpl.java:58) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.framework.imps.CuratorFrameworkImpl$1.process(CuratorFrameworkImpl.java:121) [curator-framework-2.6.0.jar:na]
    at org.apache.curator.ConnectionState.process(ConnectionState.java:152) [curator-client-2.6.0.jar:na]
    at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:522) [zookeeper-3.4.6.jar:3.4.6-1569965]
    at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498) [zookeeper-3.4.6.jar:3.4.6-1569965]