apache / cassandra-gocql-driver

GoCQL Driver for Apache Cassandra®
https://cassandra.apache.org/
Apache License 2.0
2.58k stars 622 forks source link

after upgrade logon takes too long (ConnectTimeout) #936

Open cassandraMetro opened 7 years ago

cassandraMetro commented 7 years ago

Hi, I have found an issue when using an non-default ConnectTimeout. In my program I have set it to 4 minutes long time ago and it was running fine. Since upgrading gocql to the latest version the connection takes >15 minutes! When setting ConnectTimeout to 10 seconds, the logon takes ~1 minute, when setting it to 2 secs, the logon takes 15secs, when defaulting it it takes ~5 secs. There are NO failures, only the logon takes too long. Using a 9-node cluster with cassandra 3.0.8. under ubuntu 14.04.

Here is a sample program:

gocql.TimeoutLimit = 100 // does not seem to have influence on logon delay

nodenames := []string{"<ip>"}

fmt.Printf("Connecting to database: %s...\n", nodenames)
ClusterConfig := gocql.NewCluster(nodenames...)
ClusterConfig.Consistency = gocql.Quorum
ClusterConfig.ConnectTimeout = time.Duration(240) * time.Second // logon takes several minutes depending on that value ?!
ClusterConfig.Authenticator = gocql.PasswordAuthenticator{Username: "YYY", Password: "XXX"}

fmt.Println("starting NewSession...")
if session, err := gocql.NewSession(*ClusterConfig); err != nil { // NewSession runs for several minutes with nondefault ConnectTimeout
    panic(err)
} else {
    fmt.Println("Connected.")
    session.Query("select somecol from sometab;").Scan()
}   
liornabat commented 7 years ago

I have the same issue, A cluster of 3 on Docker setting cluster.DisableInitialHostLookup = true didn;t help

Zariel commented 7 years ago

Can you do SELECT peer, rpc_address, broadcast_address, release_versionFROM system.peers and post the output please

liornabat commented 7 years ago

there is no broadcast_address field in this table, other values are (peer, rpc_address, release_version) 10.42.62.203 10.42.62.203 3.11.0 10.42.235.88 10.42.235.88 3.11.0

cassandraMetro commented 7 years ago

Hi, col broadcast_address is not existing.Here are the remaining cols: peer | rpc_address | release_version ---------------+---------------+----------------- A.42 | A.42 | 3.0.8 A.43 | A.43 | 3.0.8 B.150 | B.150 | 3.0.8 B.151 | B.151 | 3.0.8 B.152 | B.152 | 3.0.8 C.37 | C.37 | 3.0.8 C.38 | C.38 | 3.0.8 C.39 | C.39 | 3.0.8

I cannot tell our company's ip addresses, but I can say that all 'A' prefixes are the same, and 'B' also and so on....

Zariel commented 7 years ago

What address are they contactable via? The one advertised?

cassandraMetro commented 7 years ago

Connection works on all of the 9 IPs. Always takes very long time for connecting.

liornabat commented 7 years ago

the same

liornabat commented 7 years ago

A hint, if i run the cluster locally (using docker) AND disconnect the internet (no external connectivity) it's connected immediately, i enable it the internet again, 16-20 seconds for getting session. very annoying. any idea?

cassandraMetro commented 7 years ago

Small update: when using cluster.DisableInitialHostLookup = true cluster.IgnorePeerAddr = true connection works in ~1 second. Problem is that I cannot estimate possible side effects of the parameters.

Zariel commented 7 years ago

if you build with gocq_debug tag you should see what hosts have been discovered and at what address.

nkev commented 5 years ago

I have just one single scyllaDB host in Docker on my MacBook (I've also tried a single Cassandra host with the same result). I can connect to it immediately from DBeaver Enterprise but from GoCQL createSession() takes at least 5 seconds, regardless of the parameter variations:

        cluster := gocql.NewCluster("localhost")
    cluster.Keyspace = "my_app"
    cluster.IgnorePeerAddr = true
    cluster.ConnectTimeout = time.Second * 1
    cluster.DisableInitialHostLookup = true
    cluster.Consistency = gocql.One
    cluster.Port = 32794 // CQL port on Docker

Like @liornabat , if I turn off WIFI, it connects instantly, but this is not convenient since I need the Internet all the time.

This createSession() connection lag is in many threads now and seems to have not been resolved for a long time. Has anyone made any progress?

nkev commented 5 years ago

I finally got it working. The issue was GoCQL on My Mac which is naturally on 127.0.0.1, was trying to connect to 172.17.0.2 (which is the network inside the docker container). I figured this out by running GoCQL with the gocql_debug tag:

go build -o my_app -tags "gocql_debug" && ./my_app

...which showed me this error 3 times:

connection failed "172.17.0.2": dial tcp 172.17.0.2:32818: i/o timeout, reconnecting with *gocql.ConstantReconnectionPolicy

So I then tried routing the scyllaDB port inside docker (9042) to my network and also changing the scyllaDB broadcast address to match my local network:

docker run --name scylla1 -p 9042:9042 -d scylladb/scylla --broadcast-address 127.0.0.1 --listen-address 0.0.0.0 --broadcast-rpc-address 127.0.0.1

...and GoCQL connection is now instant!