Closed andrewdeandrade closed 7 years ago
build with gocql_debug
tag and share the logs
I'm new to gocql, but have the same issue:
package main
import (
"fmt"
"github.com/gocql/gocql"
)
func main() {
cluster := gocql.NewCluster("127.0.0.1")
_, err := cluster.CreateSession()
if err != nil {
panic(err)
}
fmt.Println("cassandra init done")
}
Resulting in
go build -tags="gocql_debug"
./playground
2017/08/09 13:38:16 gocql: Session.handleNodeUp: 127.0.0.1:9042
2017/08/09 13:38:18 unable to dial "172.20.0.6": dial tcp 172.20.0.6:9042: i/o timeout
2017/08/09 13:38:18 gocql: Session.handleNodeDown: 172.20.0.6:9042
2017/08/09 13:38:20 unable to dial "172.20.0.3": dial tcp 172.20.0.3:9042: i/o timeout
2017/08/09 13:38:20 gocql: Session.handleNodeDown: 172.20.0.3:9042
2017/08/09 13:38:20 gocql: Session.handleNodeUp: 172.20.0.6:9042
2017/08/09 13:38:22 unable to dial "172.20.0.6": dial tcp 172.20.0.6:9042: i/o timeout
2017/08/09 13:38:22 gocql: Session.handleNodeUp: 172.20.0.3:9042
2017/08/09 13:38:22 gocql: Session.handleNodeDown: 172.20.0.6:9042
2017/08/09 13:38:24 unable to dial "172.20.0.3": dial tcp 172.20.0.3:9042: i/o timeout
panic: no connections were made when creating the session
Cassandra is running as 2 nodes in docker container.
Connecting with cqlsh results in
cqlsh --cqlversion=3.4.4
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
cqlsh> describe keyspaces
system_schema system_auth system system_distributed system_traces
Can you post the following:
nodetool status
SELECT peer, preferred_ip, rpc_address FROM system.peers;
@Zariel cassandra version: Cassandra 3.11.0
cqlsh> SELECT peer, preferred_ip, rpc_address FROM system.peers ;
peer | preferred_ip | rpc_address
------------+--------------+-------------
172.20.0.6 | null | 172.20.0.6
Is 172.20.0.6 routable from your client? Cassandra is telling Goqcl that its rpc_address is that so thats what its trying to dial, is it running in Docker? Regardless, set broadcast_rpc_address to the address you expect to be able to dial
@Zariel I'll check broadcast_rpc a bit later, but yes cassandra is running as a docker container (mac os). Here is a compose.yml
hostname: cassandra-1
image: cassandra:latest
command: /bin/bash -c "sleep 1 && echo ' -- Pausing to let system catch up ... -->' && /docker-entrypoint.sh cassandra -f"
ports:
- "9042:9042"
expose:
- 7000
- 7001
- 7199
- 9042
- 9160
cassandra-2:
hostname: cassandra-2
image: cassandra:latest
command: /bin/bash -c "sleep 30 && echo ' -- Pausing to let system catch up ... -->' && /docker-entrypoint.sh cassandra -f"
environment:
- CASSANDRA_SEEDS=cassandra-1
links:
- cassandra-1
expose:
- 7000
- 7001
- 7199
- 9042
- 9160
Also I must notice that with the https://github.com/gocql/gocql/tree/7e9748ccda7fd5135a7db13ba03f09cad0c86bed revision there is no such an issue.
Excuse me, Has the problem been solved? I have the same problem
I believe this is caused by docker on macOS specifically because the docker container IP address isn't routable (by default) from the macOS host directly. I was able to workaround this by setting the CASSANDRA_BROADCAST_ADDRESS
in the docker compose.yml
or in the command that starts the container.
In my case I did:
docker run ... -e CASSANDRA_BROADCAST_ADDRESS=127.0.0.1 -p 9042:9042 ... cassandra
I'm, also having this issue running C* 2.2.9 in a docker container on Linux. Pinning to 7e9748ccda7fd5135a7db13ba03f09cad0c86bed has fixed the issue, though I'd prefer a more permanent solution.
I tried setting all of CASSANDRA_BROADCAST_ADDRESS
, CASSANDRA_RPC_ADDRESS
, and CASSANDRA_LISTEN_ADDRESS
to 127.0.0.1
with no success.
Oddly, I don't see the issue if I explicitly expose the port with docker run ... -p 9042:9042
. I only see it when I ask it to choose a random port with docker run ... -P
. I'm perplexed by that part.
@APTy @johnweldon @tuyz @megaherz can anyone provide reproducing steps?
@Zariel In my case, the setup was this:
-p 9042:9042
gocql
to connect to the cluster on [host-ip]:9042Apparently the initial connect worked, but then the node advertised an IP address specific to docker, which was not visible outside of docker, and gocql tried to connect to that IP address and (understandably) failed because the docker specific IP was not routable from the OSX host.
What cassandra image? With what config? Which version?
I'm sorry @Zariel - I don't have that setup any more, based on what I can reconstruct this is what I think was running:
$ docker images | grep cassandra
cassandra 3 535a7b98d04d 4 weeks ago 386MB
docker run -d \
--restart=always \
-e CASSANDRA_BROADCAST_ADDRESS=192.168.199.199 \
-p 7000-7001:7000-7001 \
-p 7199:7199 \
-p 9042:9042 \
-p 9160:9160 \
cassandra:3
Without the CASSANDRA_BROADCAST_ADDRESS I got the error where it was trying to dial the docker internal IP address 172.?.?.?
instead of the assigned external address 192.168.199.199
Once I added the envar it worked.
My client application is configured to connect to cassandra on 192.168.199.199:9042
Same issue here. Going back in commits, I receive the error at commit 77431609f517cb41ee9afdcdd373561c4d935316. With code before that commit, I can connect without issues.
I can only get it to work if I correctly configure cassandra running inside docker to advertise its broadcast address as the docker-machine ip
value via docker run -e CASSANDRA_BROADCAST_ADDRESS=$(docker-machine ip) -p 9042:9042 library/cassandra
which is what I would expect.
The reason that 7743160 made this no longer work is that cassandra is telling the driver that it is available at (for me) 172.17.0.2
, the ring up to this point looks like
control: 192.168.99.100
192.168.99.100
: UP
Then the control connection triggers a refresh of the ring, system.local
looks like
listen_address | broadcast_address
----------------+-------------------
172.17.0.2 | 172.17.0.2
The driver then removes 192.168.99.100
from its local ring due it not being in system.peers
or system.local
and adds 172.17.0.2
. At this point the driver checks to see if it has any connections in the connection pool, which it does not so returns the (admittedly poor) ErrNoConnections
error.
In a correctly configured environment this all happens as expected and the driver connects and will work fine.
Related there is an issue that the driver uses the remote addr of the control connection and adds that to the pool instead of doing a lookup in system.local
which is why we end up having the host removed and it working pre 7743160 and that this did not show up until then.
I'm hesitant to change this behaviour as the real issue is cassandra is not configured correctly, and if this were a prod cluster I would expect the driver to error out because its not configured properly instead of having hacks in different places in the driver to work around invalid cassandra configurations. We should improve that error message though as it is thoroughly unhelpful and improve documentation about using the driver and what assumptions it makes about the cluster it is connecting to.
Note that cqlsh works without this setting, I'm not entirely sure what setup it is using when doing host discovery
In a correctly configured environment this all happens as expected and the driver connects and will work fine.
How can we configure docker-cassandra so the driver does not remove the broadcast address from its local ring?
It wont as long as the value of broadcast_address is reachable from the driver, try docker run -e CASSANDRA_BROADCAST_ADDRESS=$(docker-machine ip) -p 9042:9042 library/cassandra
We're hit by this through Vault, as marked in the issue above.
Broadcast address is set properly in our case, so this is not the root cause of the problem.
Can I suggest a revert of 7e9748c until investigations are being done?
@Zariel - can you please comment on this? Thanks a ton.
@ror6ax can you please do
SELECT listen_address, rpc_address, broadcast_address FROM system.local;
and
SELECT peer, rpc_address, preferred_ip FROM system.peers
and if possible rebuild vault with gocql_debug
tag and post the output
Here you go: SELECT listen_address, rpc_address, broadcast_address FROM system.local; gives
listen_address='10.255.11.243', rpc_address='10.255.11.243', broadcast_address='10.255.11.243'
and SELECT peer, rpc_address, preferred_ip FROM system.peers gives
peer='10.255.8.12', rpc_address='10.255.8.12', preferred_ip=None
peer='10.255.7.69', rpc_address='10.255.7.69', preferred_ip=None
@Zariel - does this tell you anything new? We're still not able to make Vault work...
We are unable to use gocql unless with https://github.com/gocql/gocql/pull/888/commits/43497d0755ed17a779855435df40474fe21171a7 reversed. Please advise.
can you please open another ticket and try to figure out WHY that should be reverted? I want to understand WHY this is causing an issue so that a test can be added and the issue fixed instead of just knee jerk revert.
I'm getting the same issue, if i'm using proxy to connect to cassandra. i need to revert to 7e9748c to get it worked
2017/10/06 19:03:11 gocql: Session.handleNodeUp: 127.0.0.1:9042
2017/10/06 19:03:13 unable to dial "192.168.1.151": dial tcp 192.168.1.151:9042: i/o timeout
2017/10/06 19:03:13 gocql: Session.handleNodeDown: 192.168.1.151:9042
2017/10/06 19:03:15 unable to dial "192.168.1.150": dial tcp 192.168.1.150:9042: i/o timeout
2017/10/06 19:03:15 gocql: Session.handleNodeDown: 192.168.1.150:9042
2017/10/06 19:03:15 gocql: Session.handleNodeUp: 192.168.1.151:9042
2017/10/06 19:03:17 unable to dial "192.168.1.151": dial tcp 192.168.1.151:9042: i/o timeout
2017/10/06 19:03:17 gocql: Session.handleNodeDown: 192.168.1.151:9042
2017/10/06 19:03:17 gocql: Session.handleNodeUp: 192.168.1.150:9042
2017/10/06 19:03:19 unable to dial "192.168.1.150": dial tcp 192.168.1.150:9042: i/o timeout
2017/10/06 19:03:19 gocql: Session.handleNodeDown: 192.168.1.150:9042
2017/10/06 19:03:19 cassandra DB Connection Error: no connections were made when creating the session
@Zariel - would you accept PR with a flag to disable the related functionality in gocql? I suspect, just like in my case, you don't change Cassandra prod setup just like that.
You can already disable the initial host lookup and all host events if you like, https://github.com/gocql/gocql/blob/2416cf340d32ee20794e739fa794968858295098/cluster.go#L98
@ror6ax also I would like to fix the root cause of the issue and have a test that proves its fixed otherwise there is no way to know if it gets introduced again. No one has yet debugged and figured out the root cause and instead just saying revert X is not helpful as it just bandages over the issue.
It looks like the port is ignored even when added manually into the code:
package main
import (
"fmt"
"github.com/gocql/gocql"
)
func main() {
cluster := gocql.NewCluster("127.0.0.1")
cluster.Port = 9043
_, err := cluster.CreateSession()
if err != nil {
panic(err)
}
fmt.Println("cassandra init done")
}
The output of the command is:
panic: gocql: unable to create session: unable to discover protocol version: dial tcp 127.0.0.1:9043: getsockopt: connection refused
I dont know what you mean, in the error it dialled 127.0.0.1:9043
what did you expect to happen? tcp 127.0.0.1:9043: getsockopt: connection refused
Yes, you're right this doesn't explain anything, and it's not gocql issue. The issue itself remains connected to Vault and using always default port, even when different is set.
@Zariel I believe that this used to work. It may be a purposeful breaking change, but if not, perhaps both styles could be accepted.
I think that the issue here is twofold. There was a change in behavior which is now less forgiving if the Cassandra configuration is not set properly (i.e. the broadcast address is incorrectly specified) as pointed out in https://github.com/gocql/gocql/issues/946#issuecomment-326807642, which is a completely valid reason for a fix.
However, I believe that the underlying root cause on continued no connections were made when creating the session
errors even after correctly setting the broadcast address is that gocql is not respecting the port that is passed as part of the host. The error message just happened to be the same, which made things a bit confusing. I've opened a separate GH issue with greater detail and repro cases on this.
The example provided in https://github.com/gocql/gocql/issues/946#issuecomment-339659163 is for the successful case where the port is provided explicitly (i.e. cluster.Port = 9043
), and not for the case where the bug occurs when the port is passed as part of the host ("127.0.0.1:9043"
).
I am having the same issue, I am trying it for the first time but running into no connections were made when creating the session
.
I saw some people using docker but where does this Docker come from?
@Emixam23 I believe people are using docker to run the Cassandra cluster, this has nothing to do with gocql. What exactly is the issue you're having?
Actually I did find out, it says that no connections were made when creating the session
when an error happens, it, however, doesn't matter what error it is (as far as I could see)
My issue was that the keyspace wasn't existing in my local database..
In my case, I had set NumConns to 0 by mistake.
Hello, I just ran into this as a problem that would sporadically happen.
In my case, no connections were made when creating the session
wouldn't always happen. I had checked the listen_address, rpc_address, and broadcast_address. At the time they had been set to the local network ip, localhost, and the network ip (their ip's specifically, not their hostnames, this comes up later). I changed all of these addresses to 127.0.0.1
in the scylla.yaml server conf and restarted. The cqlsh verified the changes, but my application still was having a problem where the simple query I had made wouldn't always run, same error message. I found that when I changed my code to connect to the ip address of localhost instead of using localhost
as a host for dns lookup, my problem went away.
/* from */
c := gocql.NewCluster("localhost")
/* to */
c := gocql.NewCluster("127.0.0.1")
My hosts file has an entry for localhost
and my local system utilities have no problem finding it. However, I just noticed that my hosts file has an ipv4 and ipv6 entry for localhost
. This makes me wonder if sometimes gocql when it is doing its DNS lookup, that sometimes it is trying to connect with the ipv6 address, where Scylla isn't running, causing the error.
My operating system is Debian Bullseye
As mentioned by cavln in https://github.com/gocql/gocql/issues/946#issuecomment-340070506, I believe it would be more beneficial to provide a clearer error stack for this error.
In my case I changed config name for *ClusterConfig.NumConns, thus it used NumConns=0 by default, which obviously result in a no connections were made
.
A clearer root cause in ClusterConfig would be very helpful, otherwise we'll need to inspect every releated things: host? port? or NumConns?
Can you post the following:
- output of
nodetool status
- output of
SELECT peer, preferred_ip, rpc_address FROM system.peers;
- cassandra version
@Zariel I'm getting
SELECT peer, preferred_ip, rpc_address FROM system.peers;
peer | preferred_ip | rpc_address ------+--------------+-------------
Pull Request #888 is causing CreateSession to exit with a "no connections were made when creating the session" error. I am at a loss on how to provide more details on this error since it is pretty vague.
cc/ @rkuris