Open lauritzthamsen opened 10 years ago
The problem seems to be that we use the zookeeper
-object without making sure that a connection to zookeeper has been established.
logging
LOG.info(String.valueOf(zookeeper.getState()));
before calling
ZookeeperHelper.initDirectories(this.zookeeper);
shows that the zookeeper
-object is still in the CONNECTING
state just before
java.lang.IllegalStateException: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /aura
waiting for state CONNECTED
resolves this issue.
i also found Apache Curator. it's a framework built on top of ZooKeeper and provides a higher-level API as well as connection guarantees. i think it might be a good idea for us to use Curator.
Session establishment is asynchronous. This constructor will initiate connection to the server and return immediately - potentially (usually) before the session is fully established. The watcher argument specifies the watcher that will be notified of any changes in state. This notification can come at any point before or after the constructor call has returned.
Apparently, it can rarely happen that the connection setup last longer than the execution of the constructor. But this can be solved easily by adding a new statement in the switch of the Watcher. It should execute the initDirectories method after receiving the connected state.
well, all further interactions with the ZooKeeper files need the connection to be established, not just initDirectories()
. all these interactions would have to take place in the Watcher's event callback, but the TaskManager's setupZookeeper()
method even returns the zookeeper object for further interactions with the ZooKeeper server... i think it's easiest to explicitly wait for the connection to establish as fix for now.
i'll also have a look at Curator in the next days. would just be cool to have it take care of connection establishment and failures for us.
running the example clients currently fails on OS X with the following output:
this is the case for both the state on master (e.g.
SimpleClient
at 90147c369a45142dd1a66db0524788397dc1d4f2) and develop (e.g.IntegrationTests
at 87451d6d219452e513e376e783e44858a80b668e).stepping through these clients sometimes leads to successful runs, which might suggest a timing issue and not a general problem with OS X.