elodina / datastax-enterprise-mesos

DataStax Enterprise on Mesos
http://www.elodina.net
15 stars 4 forks source link

Allowing different cql port for C* nodes may not work correctly #59

Closed abiletskyi closed 8 years ago

abiletskyi commented 8 years ago

Currently Scheduler allows C* nodes to start with different cql (native_transport_port) port - unlike agent and storage port. This may not work correctly in particular failure scenarios.

Datastax-java-driver generally encourage to specify only hosts list when creating session. There is only one methods that accepts ports too: https://docs.datastax.com/en/drivers/java/2.1/com/datastax/driver/core/Cluster.Builder.html#addContactPointsWithPorts-java.util.Collection-

But overall java driver accepts that cql port is the same across all nodes. So when the Cluster is created with contact points defined as Seq(host1:port1,host2:port2) upon failure of host1 C* node, session won't recover to host2. While if port1==port2 session recovers normally.

Steps to reproduce:

  1. Create cluster with 2 nodes so that cql port will be different (e.g. remove default cql port 5002 from mesos offers on one of the mesos slaves)
  2. Create connection with datastax-driver using contactPoint=Seq(new InetSocketAddress(host1, cqlPort1), new InetSocketAddress(host2, cqlPort2))
  3. Execute in while(true) loop some simple query e.g. println(session.execute("select now() from system.local").one().getUUID(0))
  4. Stop the host1 query Actual result: Exception in thread "main" com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (no host was tried) Expected result: Session is silently reconnected to host2 node.
joestein commented 8 years ago

When we build in the multi cassandra data center within a cassandra cluster, and multiple cassandra clusters managed within a single scheduler (which is an app tagged within a zone, which themselves are within a cluster and datacenter (or other user/role structures in other systems how they launch it) then we can better enforce "everyone in this zone use this cassandra cluster use this port" then we can run hundreds of clusters (each with a dynamic port but same dynamic port) then we can acmoplish the same goal and its better because dynamic ports are hard to debug in practice.

this makes exposing through service discover easier for where you are connecting too

abiletskyi commented 8 years ago

Okay, it seems to be more a java driver issue really (opened&unresolved): https://datastax-oss.atlassian.net/browse/JAVA-860 There is also a separate ticket to allow different cql ports: https://datastax-oss.atlassian.net/browse/JAVA-944

Nevertheless with current version of java-datastax-driver there is no way to establish reliable session to C* cluster that has different cql ports. I think in scope of the task where we planned to remove zk storage we need to keep this limitation in mind. Probably for now some warning in README would suffice.