apache / cassandra-gocql-driver

GoCQL Driver for Apache Cassandra®
https://cassandra.apache.org/
Apache License 2.0
2.57k stars 618 forks source link

Regression in establishing connections #575

Closed hbcheng closed 8 years ago

hbcheng commented 8 years ago

Hello,

We observed a regression in creating a session in our CI system.

The CI uses Cassandra 2.2.4 internally.

After a rough bisect, the last working revision seems to be 40ccb13a098a105751c58b9e247556fcf8a9c382.

Newer revisions fail with ErrNoConnectionsStarted.

Please let me know if there's any additional information that I can provide.

hbcheng commented 8 years ago

PR #570 applies cleanly as well, but we start getting errors when applying 184f335ec4de549b601fc127009ddd2d01fd8f9f. We suspect PR #551

Zariel commented 8 years ago

Are there any other logs? You can add more logging to this function which should catch dial errors. https://github.com/gocql/gocql/blob/master/connectionpool.go#L453

hbcheng commented 8 years ago

Sorry for the slow response- debugging this is a bit tough because it only manifests in our CI system.

It appears the issue stems from our setup. We use Drone CI, which sets up interconnected Docker images. The instance of Cassandra is set up to listen on localhost; however, these ports are forwarded and available to the other containers. In other words, the application connects to a private IP address (for example, 172.17.0.5), but nodetool on the cassandra side indicates that the cluster consists of a single node listening on localhost (127.0.0.1).

This appears to be a peculiarity of the setup where the address used to connect to Cassandra differs from what Cassandra thinks its address is; cqlsh and old versions of gocql work fine, but the newer version appears to use the IP address Cassandra gives it as the destination to connect to. It's only really an issue in this particular case as well, since any multi-node Cassandra setups will require any nodes other than the first to be reachable via their gossip IP's anyway.

Looks like this may be an unintended side affect of an intended behavioral change? We can easily work around it, and this configuration definitely qualifies as unusual, but I wouldn't be surprised if lots of dev/CI setups have this peculiar configuration.

hbcheng commented 8 years ago

Taking this a step further, I believe (but have not yet confirmed) that this would be an issue anytime the rpc_address is distinct from the broadcast_address. We do not have broadcast_rpc_address set, so I don't yet know the impact of setting this as well.

Zariel commented 8 years ago

Hm, dont know what that would change. You should set your broadcast_address to the routable address to expose to clients.

I think the java-driver does some things when it cant reach the rpc_address from the system table, ill have a look and a think

hbcheng commented 8 years ago

Thanks for taking a look!

To clarify our setup/experience:

We have Cassandra's listen_address set to localhost, because in our CI environment we don't expect connections from other nodes (we're working on changing this, to see if it makes things return to normal).

We have Cassandra's rpc_address bound to the address of the Docker container, which is reachable and routable from the client container.

For cqlsh and older versions of gocql, we connect to the IP that rpc_address is bound to, and things work.

For newer versions of gocql, we pass the IP rpc_address is bound to to the session constructor, and what we observe is an attempted connection to localhost (listen_address) that fails, and the session start errors out.

Zariel commented 8 years ago

What does this look like in system.peers and system.local?

On Wed, 13 Jan 2016, 10:30 p.m. Hao Bryan Cheng notifications@github.com wrote:

Thanks for taking a look!

To clarify our setup/experience:

We have Cassandra's listen_address set to localhost, because in our CI environment we don't expect connections from other nodes (we're working on changing this, to see if it makes things return to normal).

We have Cassandra's rpc_address bound to the address of the Docker container, which is reachable and routable from the client container.

For cqlsh and older versions of gocql, we connect to the IP that rpc_address is bound to, and things work.

For newer versions of gocql, we pass the IP rpc_address is bound to to the session constructor, and what we observe is an attempted connection to localhost (listen_address) that fails, and the session start errors out.

— Reply to this email directly or view it on GitHub https://github.com/gocql/gocql/issues/575#issuecomment-171457693.

hbcheng commented 8 years ago
cqlsh> SELECT * FROM system.peers;

 peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens
------+-------------+---------+--------------+------+-----------------+-------------+----------------+--------

(0 rows)
cqlsh> SELECT * FROM system.local;

 key   | bootstrapped | broadcast_address | cluster_name | cql_version | data_center | gossip_generation | host_id                              | listen_address | native_protocol_version | partitioner                                 | rack  | release_version | rpc_address | schema_version                       | thrift_version | tokens                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | truncated_at
-------+--------------+-------------------+--------------+-------------+-------------+-------------------+--------------------------------------+----------------+-------------------------+---------------------------------------------+-------+-----------------+-------------+--------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------
 local |    COMPLETED |         127.0.0.1 | Test Cluster |       3.3.1 | datacenter1 |        1452724338 | f2aca9ef-ebc2-49d8-badc-dc0b2ade5ae6 |      127.0.0.1 |                       4 | org.apache.cassandra.dht.Murmur3Partitioner | rack1 |           2.2.4 |  172.17.0.3 | b145475a-02dc-370c-8af7-a9aba2d61362 |         20.1.0 | {'-1019276467326062935', '-1144317575032212705', '-1162727536843078329', '-1269389850813805109', '-1325658250067742631', '-1358557216312870534', '-1480241737982077937', '-1518516300801024298', '-1585020779524284605', '-1603479117926075636', '-1651882947500662976', '-1703969055088152838', '-1768866204671579451', '-1818301577762134186', '-1863588927273950578', '-1901640957577567339', '-1925435576704799957', '-2024284801894820236', '-2060400178279767740', '-207407765157472368', '-2106684677930028136', '-2177498559698344801', '-2275315979410116215', '-2314319276408408848', '-2337888858750457969', '-2511164640490956619', '-2552834851120650550', '-2687649359322905708', '-2720446002495546933', '-2950197505306604758', '-2967440084461403973', '-2975986555913399621', '-3017731231784056005', '-3223933117105092081', '-3358749329715434620', '-3473943631955487063', '-3613127249277846351', '-3641248702042861396', '-3671846378364672266', '-3708037675090630064', '-3746804790135207139', '-3800684226969347518', '-3810894117711660332', '-3872880375325311359', '-3990804044604917221', '-4080753374806546033', '-4169734193379403879', '-4245630539896254938', '-4428005755292153587', '-4475667552055400185', '-4518483648709877155', '-459252276913340725', '-472224882647127747', '-4742531919515141172', '-4760546881127039170', '-4791728061138155272', '-4816198301858106663', '-4867518241955876003', '-5077368772741281537', '-5126296282076026644', '-5347973130747414635', '-5444313743524857756', '-5603806839003176250', '-5643927168003878581', '-5720660172964400850', '-5746999409295650274', '-5941949947602858521', '-5972197463599676475', '-6017707705801731827', '-6092974318357780092', '-6146799436841737819', '-6178071654300054623', '-6257213091152875436', '-6271519813294885932', '-6302329094518852238', '-6388432676516545657', '-643895328048809672', '-6446088722975612335', '-6471123940528649387', '-6516676052894658182', '-6564868237990397731', '-662224858361931327', '-668051513936748775', '-6685760163789264258', '-6721534450302020665', '-6883448004267284229', '-6957874393588674949', '-6986580519567884207', '-7328956075172712674', '-7349465514352244102', '-7407954415492360654', '-7544804299317127451', '-7550056416259878632', '-7555271028661712204', '-7593433112107955341', '-7665423536692064478', '-7873306444072615172', '-7878547407607966251', '-7897325220226490983', '-7971542893391392897', '-8030571811060357963', '-8260414739161610810', '-8348301086329714702', '-835622269463610262', '-838026776982364108', '-8447424844505479783', '-8529511933023819517', '-8543514441820465809', '-857379303009342400', '-8637105758598102818', '-864060485547829842', '-8682857550004396413', '-872120697149963175', '-8727263837186526621', '-8737772761917022939', '-8756943694442634867', '-8815417209060691882', '-8818093732570826719', '-8878310366935095443', '-8892588039805757173', '-8998157741836205708', '-9095957796213775347', '-9131177561886775792', '-9138403926243485625', '-9178690828230032988', '-9216689068317676576', '1010249694091181728', '1128095450936674531', '1305873821959355414', '1315357403379201347', '1363759776565035762', '1400836620131098189', '1422755944036820336', '1438722574911002366', '1573984129816838027', '1616017682428625650', '1712672937726925449', '1813407532886798518', '1815027902146145046', '1927705332833885043', '1949962922014158055', '2015760303070627659', '2050292657755098071', '2082931070012741845', '2205006383068161828', '2436920379949832531', '277909505702666654', '2796609003185550092', '2805595718920965696', '2817761887600409774', '2827993450608594985', '2837678535361570794', '294772971570236909', '3039617962312833069', '3100488518536296520', '3104215007034838145', '3182943340275154526', '319937815770152761', '3208981547576892507', '3288992235637143318', '3301488792572522330', '3399798242758726742', '3404905303440564699', '345083898615698368', '3501583552210134115', '3547981446971216280', '3561827855892276895', '3631297273782446673', '3658959898719541563', '3715883223906904320', '3724580367810321306', '3726495444396649004', '3728430547556100364', '3820084987208626329', '3889543234077876331', '391987301427871765', '4008159025333466238', '4247273983298807996', '4317240518629130914', '4419669190748436429', '4536091152037641653', '4555600539952713635', '4748086834988121894', '4905616938916716335', '4915711073702837966', '4960641494934195166', '4969358863277829542', '4996314764903052295', '5140447765029984031', '5142744068127834922', '5171453824839845830', '5273494482080909472', '5278150217032963132', '5318614240163322344', '5334739932465295846', '5408329803679915598', '5455687083267264233', '5485643855387031856', '5602512488900481032', '569175000857893999', '5781752446043839959', '5830878251707545285', '5907867590305393844', '5919583642614603865', '6026781090258885261', '6111993606716719572', '6220997296874840746', '6230729503812593226', '6272511487893346276', '632004667510290047', '6345778376468106869', '6398841357802524822', '6480966120821151805', '6492192250922142728', '6521468207016883663', '6591689759979427539', '6716087964702017544', '6738654376464867610', '67909863393270132', '6846505723673431777', '7003392452128029526', '7231940866080963152', '7369136718351109125', '7386594198151269745', '7440853315337655802', '7590588163728084479', '762302240102491592', '771242897572211947', '779134332412746807', '7839850143662331485', '7976782055076917335', '803945216422782721', '8044514163889598079', '8246931850942466640', '8247841444016009168', '8260408163947362741', '8367382217197716331', '8368974226098745667', '8389070165447865843', '8424812490114360172', '8454279879079437574', '8522653391341608943', '8599101260892850740', '863272376406653010', '86440323863059345', '8691723384403555111', '8806073313928020965', '8867545881896451527', '9048777435539304278', '9099324946834895053', '9099889558317674948', '9148181351778174455', '9192270461651921024', '9193542465768704027', '97816987976290572', '985075733516077335'} | {55080ab0-5d9c-3886-90a4-acb25fe1f77b: 0xffffffffffffffff00000000000001523d1e3d73}

As described in system.local, rpc_address is 172.17.0.3, everything else is set to default/localhost.

danielchatfield commented 8 years ago

Could this be caused by the host discovery? i.e. it connects and then tries to connect to hosts with incorrect IP because of docker weirdness.

Zariel commented 8 years ago

@danielchatfield yes I think the reason is either the cassandra node is incorrectly configured (im not entirely sure how the 6 _address map to system.peers table) or we are using the wrong address from system.peers to connect to, the latter is what is most likely the issue.

I have a feeling we should use the address from the event to connect if it is different to what we get from the system.peers table.

obeattie commented 8 years ago

Yes, would be fixable with #585 (in the case of certain docker setups, C* listens on a different IP than clients use because Docker proxies connections).

obeattie commented 8 years ago

In my case, I am using docker-machine, so C* is listening within the VM on a loopback address but Docker is proxying it through another network interface to break out of the VM.

Zariel commented 8 years ago

@obeattie can you confirm that the issue that you see is that when gocql looks up the host info in system.peers its all wrong, so that when the pool uses that to connect instead of the ones supplied, no connections are created?

obeattie commented 8 years ago

Yes, that's exactly what is happening. Specifically, because C* is being accessed via a proxy (in this case docker-proxy), gocql can't reach it at the address C* is broadcasting/listening on.

I think the host discovery mechanism is valuable and a worthwhile addition to the library, but it should be possible to disable it in favour of explicitly connecting to specified hosts. It would help in the situation above (or any situation in which C* is behind a proxy), but also in situations where the user knows better than the library which hosts to connect to. The latter is true in our production setup where we want very fine-grained control over which services connect to which hosts.

Zariel commented 8 years ago

That case wont be fixed by disable host discovery, the driver would also need to disable all the host info looks (which I guess fail as well), which in turn will probably mean token aware routing wont work.

A work around which I think will fix this case is to do the host lookup in system.peers and then use the supplied address (either via event or initial hosts) instead of the discovered one. But I don't have an environment I can test this solution in.

Zariel commented 8 years ago

@hbcheng @obeattie can you check if the new options added to cluster address this issue for you?

hbcheng commented 8 years ago

@Zariel Thank you for the quick turnaround on this. I'll deploy it to our CI system and report back as soon as I can.

hbcheng commented 8 years ago

Can confirm that DisableInitialHostLookup worked on our end; IgnorePeerAddr did not make a difference one way or another.

Please let me know if there's any additional testing or configuration details I can provide.

jstoiko commented 8 years ago

Stumbled upon this issue. To reproduce:

$ docker run -d -P --name cassandra2 cassandra:2
$ echo "$(docker-machine ip default):$(docker port cassandra2 9042 | awk -F: '{print $2}')"

Then use echoed <ip>:<port> to connect, e.g. gocql.NewCluster("<ip>:<port>")

DisableInitialHostLookup does the trick.

Zariel commented 8 years ago

Im going to considered this resolved, ill add a note in the README about connecting to nodes which dont have a routable broadcast address.

varunturlapati commented 7 years ago

Is this workaround for docker command or gocql driver only?