Closed hbcheng closed 8 years ago
PR #570 applies cleanly as well, but we start getting errors when applying 184f335ec4de549b601fc127009ddd2d01fd8f9f. We suspect PR #551
Are there any other logs? You can add more logging to this function which should catch dial errors. https://github.com/gocql/gocql/blob/master/connectionpool.go#L453
Sorry for the slow response- debugging this is a bit tough because it only manifests in our CI system.
It appears the issue stems from our setup. We use Drone CI, which sets up interconnected Docker images. The instance of Cassandra is set up to listen on localhost; however, these ports are forwarded and available to the other containers. In other words, the application connects to a private IP address (for example, 172.17.0.5), but nodetool on the cassandra side indicates that the cluster consists of a single node listening on localhost (127.0.0.1).
This appears to be a peculiarity of the setup where the address used to connect to Cassandra differs from what Cassandra thinks its address is; cqlsh and old versions of gocql work fine, but the newer version appears to use the IP address Cassandra gives it as the destination to connect to. It's only really an issue in this particular case as well, since any multi-node Cassandra setups will require any nodes other than the first to be reachable via their gossip IP's anyway.
Looks like this may be an unintended side affect of an intended behavioral change? We can easily work around it, and this configuration definitely qualifies as unusual, but I wouldn't be surprised if lots of dev/CI setups have this peculiar configuration.
Taking this a step further, I believe (but have not yet confirmed) that this would be an issue anytime the rpc_address is distinct from the broadcast_address. We do not have broadcast_rpc_address set, so I don't yet know the impact of setting this as well.
Hm, dont know what that would change. You should set your broadcast_address
to the routable address to expose to clients.
I think the java-driver does some things when it cant reach the rpc_address from the system table, ill have a look and a think
Thanks for taking a look!
To clarify our setup/experience:
We have Cassandra's listen_address
set to localhost, because in our CI environment we don't expect connections from other nodes (we're working on changing this, to see if it makes things return to normal).
We have Cassandra's rpc_address
bound to the address of the Docker container, which is reachable and routable from the client container.
For cqlsh and older versions of gocql, we connect to the IP that rpc_address
is bound to, and things work.
For newer versions of gocql, we pass the IP rpc_address
is bound to to the session constructor, and what we observe is an attempted connection to localhost (listen_address
) that fails, and the session start errors out.
What does this look like in system.peers and system.local?
On Wed, 13 Jan 2016, 10:30 p.m. Hao Bryan Cheng notifications@github.com wrote:
Thanks for taking a look!
To clarify our setup/experience:
We have Cassandra's listen_address set to localhost, because in our CI environment we don't expect connections from other nodes (we're working on changing this, to see if it makes things return to normal).
We have Cassandra's rpc_address bound to the address of the Docker container, which is reachable and routable from the client container.
For cqlsh and older versions of gocql, we connect to the IP that rpc_address is bound to, and things work.
For newer versions of gocql, we pass the IP rpc_address is bound to to the session constructor, and what we observe is an attempted connection to localhost (listen_address) that fails, and the session start errors out.
— Reply to this email directly or view it on GitHub https://github.com/gocql/gocql/issues/575#issuecomment-171457693.
cqlsh> SELECT * FROM system.peers;
peer | data_center | host_id | preferred_ip | rack | release_version | rpc_address | schema_version | tokens
------+-------------+---------+--------------+------+-----------------+-------------+----------------+--------
(0 rows)
cqlsh> SELECT * FROM system.local;
key | bootstrapped | broadcast_address | cluster_name | cql_version | data_center | gossip_generation | host_id | listen_address | native_protocol_version | partitioner | rack | release_version | rpc_address | schema_version | thrift_version | tokens | truncated_at
-------+--------------+-------------------+--------------+-------------+-------------+-------------------+--------------------------------------+----------------+-------------------------+---------------------------------------------+-------+-----------------+-------------+--------------------------------------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------
local | COMPLETED | 127.0.0.1 | Test Cluster | 3.3.1 | datacenter1 | 1452724338 | f2aca9ef-ebc2-49d8-badc-dc0b2ade5ae6 | 127.0.0.1 | 4 | org.apache.cassandra.dht.Murmur3Partitioner | rack1 | 2.2.4 | 172.17.0.3 | b145475a-02dc-370c-8af7-a9aba2d61362 | 20.1.0 | {'-1019276467326062935', '-1144317575032212705', '-1162727536843078329', '-1269389850813805109', '-1325658250067742631', '-1358557216312870534', '-1480241737982077937', '-1518516300801024298', '-1585020779524284605', '-1603479117926075636', '-1651882947500662976', '-1703969055088152838', '-1768866204671579451', '-1818301577762134186', '-1863588927273950578', '-1901640957577567339', '-1925435576704799957', '-2024284801894820236', '-2060400178279767740', '-207407765157472368', '-2106684677930028136', '-2177498559698344801', '-2275315979410116215', '-2314319276408408848', '-2337888858750457969', '-2511164640490956619', '-2552834851120650550', '-2687649359322905708', '-2720446002495546933', '-2950197505306604758', '-2967440084461403973', '-2975986555913399621', '-3017731231784056005', '-3223933117105092081', '-3358749329715434620', '-3473943631955487063', '-3613127249277846351', '-3641248702042861396', '-3671846378364672266', '-3708037675090630064', '-3746804790135207139', '-3800684226969347518', '-3810894117711660332', '-3872880375325311359', '-3990804044604917221', '-4080753374806546033', '-4169734193379403879', '-4245630539896254938', '-4428005755292153587', '-4475667552055400185', '-4518483648709877155', '-459252276913340725', '-472224882647127747', '-4742531919515141172', '-4760546881127039170', '-4791728061138155272', '-4816198301858106663', '-4867518241955876003', '-5077368772741281537', '-5126296282076026644', '-5347973130747414635', '-5444313743524857756', '-5603806839003176250', '-5643927168003878581', '-5720660172964400850', '-5746999409295650274', '-5941949947602858521', '-5972197463599676475', '-6017707705801731827', '-6092974318357780092', '-6146799436841737819', '-6178071654300054623', '-6257213091152875436', '-6271519813294885932', '-6302329094518852238', '-6388432676516545657', '-643895328048809672', '-6446088722975612335', '-6471123940528649387', '-6516676052894658182', '-6564868237990397731', '-662224858361931327', '-668051513936748775', '-6685760163789264258', '-6721534450302020665', '-6883448004267284229', '-6957874393588674949', '-6986580519567884207', '-7328956075172712674', '-7349465514352244102', '-7407954415492360654', '-7544804299317127451', '-7550056416259878632', '-7555271028661712204', '-7593433112107955341', '-7665423536692064478', '-7873306444072615172', '-7878547407607966251', '-7897325220226490983', '-7971542893391392897', '-8030571811060357963', '-8260414739161610810', '-8348301086329714702', '-835622269463610262', '-838026776982364108', '-8447424844505479783', '-8529511933023819517', '-8543514441820465809', '-857379303009342400', '-8637105758598102818', '-864060485547829842', '-8682857550004396413', '-872120697149963175', '-8727263837186526621', '-8737772761917022939', '-8756943694442634867', '-8815417209060691882', '-8818093732570826719', '-8878310366935095443', '-8892588039805757173', '-8998157741836205708', '-9095957796213775347', '-9131177561886775792', '-9138403926243485625', '-9178690828230032988', '-9216689068317676576', '1010249694091181728', '1128095450936674531', '1305873821959355414', '1315357403379201347', '1363759776565035762', '1400836620131098189', '1422755944036820336', '1438722574911002366', '1573984129816838027', '1616017682428625650', '1712672937726925449', '1813407532886798518', '1815027902146145046', '1927705332833885043', '1949962922014158055', '2015760303070627659', '2050292657755098071', '2082931070012741845', '2205006383068161828', '2436920379949832531', '277909505702666654', '2796609003185550092', '2805595718920965696', '2817761887600409774', '2827993450608594985', '2837678535361570794', '294772971570236909', '3039617962312833069', '3100488518536296520', '3104215007034838145', '3182943340275154526', '319937815770152761', '3208981547576892507', '3288992235637143318', '3301488792572522330', '3399798242758726742', '3404905303440564699', '345083898615698368', '3501583552210134115', '3547981446971216280', '3561827855892276895', '3631297273782446673', '3658959898719541563', '3715883223906904320', '3724580367810321306', '3726495444396649004', '3728430547556100364', '3820084987208626329', '3889543234077876331', '391987301427871765', '4008159025333466238', '4247273983298807996', '4317240518629130914', '4419669190748436429', '4536091152037641653', '4555600539952713635', '4748086834988121894', '4905616938916716335', '4915711073702837966', '4960641494934195166', '4969358863277829542', '4996314764903052295', '5140447765029984031', '5142744068127834922', '5171453824839845830', '5273494482080909472', '5278150217032963132', '5318614240163322344', '5334739932465295846', '5408329803679915598', '5455687083267264233', '5485643855387031856', '5602512488900481032', '569175000857893999', '5781752446043839959', '5830878251707545285', '5907867590305393844', '5919583642614603865', '6026781090258885261', '6111993606716719572', '6220997296874840746', '6230729503812593226', '6272511487893346276', '632004667510290047', '6345778376468106869', '6398841357802524822', '6480966120821151805', '6492192250922142728', '6521468207016883663', '6591689759979427539', '6716087964702017544', '6738654376464867610', '67909863393270132', '6846505723673431777', '7003392452128029526', '7231940866080963152', '7369136718351109125', '7386594198151269745', '7440853315337655802', '7590588163728084479', '762302240102491592', '771242897572211947', '779134332412746807', '7839850143662331485', '7976782055076917335', '803945216422782721', '8044514163889598079', '8246931850942466640', '8247841444016009168', '8260408163947362741', '8367382217197716331', '8368974226098745667', '8389070165447865843', '8424812490114360172', '8454279879079437574', '8522653391341608943', '8599101260892850740', '863272376406653010', '86440323863059345', '8691723384403555111', '8806073313928020965', '8867545881896451527', '9048777435539304278', '9099324946834895053', '9099889558317674948', '9148181351778174455', '9192270461651921024', '9193542465768704027', '97816987976290572', '985075733516077335'} | {55080ab0-5d9c-3886-90a4-acb25fe1f77b: 0xffffffffffffffff00000000000001523d1e3d73}
As described in system.local, rpc_address
is 172.17.0.3
, everything else is set to default/localhost.
Could this be caused by the host discovery? i.e. it connects and then tries to connect to hosts with incorrect IP because of docker weirdness.
@danielchatfield yes I think the reason is either the cassandra node is incorrectly configured (im not entirely sure how the 6 _address map to system.peers table) or we are using the wrong address from system.peers to connect to, the latter is what is most likely the issue.
I have a feeling we should use the address from the event to connect if it is different to what we get from the system.peers table.
Yes, would be fixable with #585 (in the case of certain docker setups, C* listens on a different IP than clients use because Docker proxies connections).
In my case, I am using docker-machine
, so C* is listening within the VM on a loopback address but Docker is proxying it through another network interface to break out of the VM.
@obeattie can you confirm that the issue that you see is that when gocql looks up the host info in system.peers its all wrong, so that when the pool uses that to connect instead of the ones supplied, no connections are created?
Yes, that's exactly what is happening. Specifically, because C* is being accessed via a proxy (in this case docker-proxy
), gocql
can't reach it at the address C* is broadcasting/listening on.
I think the host discovery mechanism is valuable and a worthwhile addition to the library, but it should be possible to disable it in favour of explicitly connecting to specified hosts. It would help in the situation above (or any situation in which C* is behind a proxy), but also in situations where the user knows better than the library which hosts to connect to. The latter is true in our production setup where we want very fine-grained control over which services connect to which hosts.
That case wont be fixed by disable host discovery, the driver would also need to disable all the host info looks (which I guess fail as well), which in turn will probably mean token aware routing wont work.
A work around which I think will fix this case is to do the host lookup in system.peers
and then use the supplied address (either via event or initial hosts) instead of the discovered one. But I don't have an environment I can test this solution in.
@hbcheng @obeattie can you check if the new options added to cluster address this issue for you?
@Zariel Thank you for the quick turnaround on this. I'll deploy it to our CI system and report back as soon as I can.
Can confirm that DisableInitialHostLookup
worked on our end; IgnorePeerAddr
did not make a difference one way or another.
Please let me know if there's any additional testing or configuration details I can provide.
Stumbled upon this issue. To reproduce:
$ docker run -d -P --name cassandra2 cassandra:2
$ echo "$(docker-machine ip default):$(docker port cassandra2 9042 | awk -F: '{print $2}')"
Then use echoed <ip>
:<port>
to connect, e.g. gocql.NewCluster("<ip>
:<port>
")
DisableInitialHostLookup
does the trick.
Im going to considered this resolved, ill add a note in the README about connecting to nodes which dont have a routable broadcast address.
Is this workaround for docker command or gocql driver only?
Hello,
We observed a regression in creating a session in our CI system.
The CI uses Cassandra 2.2.4 internally.
After a rough bisect, the last working revision seems to be 40ccb13a098a105751c58b9e247556fcf8a9c382.
Newer revisions fail with ErrNoConnectionsStarted.
Please let me know if there's any additional information that I can provide.