apache / cassandra-gocql-driver

GoCQL Driver for Apache Cassandra®
https://cassandra.apache.org/
Apache License 2.0
2.59k stars 622 forks source link

Unable to connect to AWS Keyspaces: unable to dial control conn: dial tcp: i/o timeout #1449

Open zackpetersen opened 4 years ago

zackpetersen commented 4 years ago

What version of Cassandra are you using?

Local (working fine) - [cqlsh 5.0.1 | Cassandra 3.11.6 | CQL spec 3.4.4 | Native protocol v4]

AWS Keyspaces (unable to connect) - [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]

What version of Gocql are you using?

github.com/gocql/gocql v0.0.0-20200519160334-799061058e31

What did you do?

Connections to a local Cassandra server work fine without ssl enabled. Now I'm testing my application connecting to hosted Cassandra on AWS Keyspaces. I've created an aws user which has full read / write permission to my keyspace. The user credentials and RootCA file have been validated through cqlsh, but through gocql the connection always fails.

$ cat /Users/zack/.cassandra/cqlshrc
[connection]
port = 9142
factory = cqlshlib.ssl.ssl_transport_factory

[connection]
ssl = true
hostname = cassandra.us-west-2.amazonaws.com

[ssl]
validate = true
certfile = /Users/zack/.cassandra/AmazonRootCA1.pem

[authentication]
username = <<keyspaces_username>>
password = <<keyspaces_password>>
keyspace = my_keyspace

completekey = tab

$ cqlsh --cqlshrc /Users/zack/.cassandra/cqlshrc
Connected to Amazon Keyspaces at cassandra.us-west-2.amazonaws.com:9142.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
keyspaces_user@cqlsh:my_keyspace>
keyspaces_user@cqlsh:my_keyspace>insert into my_keyspace.test(id, value) values(123, 'sample');
keyspaces_user@cqlsh:my_keyspace>select * from my_keyspace.test;

 id  | value
-----+--------
 123 | sample

(1 rows)

keyspaces_user@cqlsh:my_keyspace>

What did you expect to see?

Successful connection to AWS Keyspaces through gocql.

What did you see instead?

func connect() error {
    cluster := gocql.NewCluster("cassandra.us-west-2.amazonaws.com")
    cluster.Authenticator = gocql.PasswordAuthenticator{
        Username: "<<keyspaces_username>>",
        Password: "<<keyspace_password>>",
    }
    cluster.SslOpts = &gocql.SslOptions{
        CaPath:                 "./AmazonRootCA1.pem",
        EnableHostVerification: false,
    }
    cluster.Keyspace = "my_keyspace"

    // ProtoVersion is required. Otherwise this error is returned.
    // gocql: unable to create session: unable to discover protocol version: dial tcp <<ip:port>>: i/o timeout
    cluster.ProtoVersion = 4
    cluster.ConnectTimeout = 3 * time.Second
    cluster.Consistency = gocql.Quorum
    _, err := cluster.CreateSession()
    return err
}
$ go build -tags="gocql_debug"
  $ ./testConn 
2020/05/25 20:49:05 gocql: unable to dial control conn 44.234.22.144: dial tcp 44.234.22.144:9042: i/o timeout

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

keyspaces_user@cqlsh:my_keyspace>SELECT peer, rpc_address FROM system.peers;

 peer          | rpc_address
---------------+---------------
 44.234.22.136 | 44.234.22.136
 44.234.22.153 | 44.234.22.153
 44.234.22.134 | 44.234.22.134
 44.234.22.179 | 44.234.22.179
 44.234.22.158 | 44.234.22.158
 44.234.22.138 | 44.234.22.138
 44.234.22.154 | 44.234.22.154
 44.234.22.155 | 44.234.22.155
 44.234.22.144 | 44.234.22.144

See log pasted above.

bhavar12 commented 3 years ago

Same issue in our case as well

martin-sucha commented 3 years ago

It is not clear to me from the original post what the issue would be. In general i/o timeout error message could be any network TCP or DNS issue.

We need more information to be able to find out the cause of the error.

@zackpetersen can you still reproduce this issue?

@bhavar12 why do you think it is the same issue, do you see the same error message? Please include more information (Go version/gocql version/logs/description of what you have done). "+1" style comment alone does not help us determining the cause. Thanks!

bhavar12 commented 3 years ago

@zackpetersen yes I see the same error message on our MS. we are using Go CQL version :- v0.0.0-20190927095247-bd5f930c6137 GO Version 1.16 our Cassandra node is on AWS. some times we got the error gocql: unable to dial control conn {nodeip}: gocql: no response received from cassandra within timeout period we have set gocql time out to 5 Sec.

martin-sucha commented 3 years ago

@bhavar12 Please upgrade to latest gocql version first, there are 98 commits between bd5f930c6137 and master.

our Cassandra node is on AWS

Do you mean AWS Keyspaces or AWS EC2? I'm not sure that your issue is the same as zackpetersen's.

ErrTimeoutNoResponse (gocql: no response received from cassandra within timeout period) is returned when executing a query and server does not reply in time. What query timeout do you have configured on server side?

As for the dial tcp: i/o timeout message, try capturing a network dump to see what causes the timeout as this seems to be network related. gocql: unable to dial control conn is returned with error directly from Session.dial, so the error could come only from establishing the TCP connection or TLS.

rockwithamoon commented 1 year ago

Hello. I'm facing a similar issue. AWS Keyspaces connection works OK for a couple of days, it also reconnects OK, but then suddenly it throws errors trying to connect to port 9042

2023/04/28 08:15:18 observer.go:22: cassandra: error connecting to IP: 3.238.167.151, port: 9042 error: dial tcp 3.238.167.151:9042: i/o timeout

Please note that the driver is configured to use port 9142 with TLS enabled and the above IP is accessible in port 9142.

# nc -vz -w3 3.238.167.151 9042 nc: connect to 3.238.167.151 port 9042 (tcp) timed out: Operation now in progress

# nc -vz -w3 3.238.167.151 9142 Connection to 3.238.167.151 9142 port [tcp/*] succeeded!

Why is it trying to reconnect to a non-TLS port when not configured to do so? Is this a "fallback" solution for some case on the code? Is there a workaround / configuration I should check?