apache / cassandra-gocql-driver

GoCQL Driver for Apache Cassandra®
https://cassandra.apache.org/
Apache License 2.0
2.58k stars 622 forks source link

Connection fail when Cassandra TLS certificate only contains dnsName without IP #1611

Open leerin-ruby opened 2 years ago

leerin-ruby commented 2 years ago

Please answer these questions before submitting your issue. Thanks!

What version of Cassandra are you using?

Not related. Go only.

What version of Gocql are you using?

v1.0.0

What version of Go are you using?

go version go1.17.5 linux/amd64

What did you do?

I attempt to connect to a cluster using SSL/TLS with EnableHostVerification set to true.

I connect to the cluster using FQDN. This FQDN is referenced in the dnsName field of the certificate.

What did you expect to see?

Connect to the cluster successfully

What did you see instead?

gocql: unable to create session: control: unable to connect to initial hosts: x509: cannot validate certificate for 192.168.3.10 because it doesn't contain any IP SANs

Dig into the code, I found following: session.go: `

for _, host := range hostMap {
    host := s.ring.addOrUpdate(host)

ring.go:

func (r ring) addOrUpdate(host HostInfo) *HostInfo { if existingHost, ok := r.addHostIfMissing(host); ok { existingHost.update(host) host = existingHost } return host }

`

host_source.go: `

func (h HostInfo) update(from HostInfo) { if h == from { return }

h.mu.Lock()
defer h.mu.Unlock()

from.mu.RLock()
defer from.mu.RUnlock()

// autogenerated do not update
if h.peer == nil {
    h.peer = from.peer
}
if h.broadcastAddress == nil {
    h.broadcastAddress = from.broadcastAddress
}
if h.listenAddress == nil {
    h.listenAddress = from.listenAddress
}
if h.rpcAddress == nil {
    h.rpcAddress = from.rpcAddress
}
if h.preferredIP == nil {
    h.preferredIP = from.preferredIP
}
if h.connectAddress == nil {
    h.connectAddress = from.connectAddress
}
if h.port == 0 {
    h.port = from.port
}
if h.dataCenter == "" {
    h.dataCenter = from.dataCenter
}
if h.rack == "" {
    h.rack = from.rack
}
if h.hostId == "" {
    h.hostId = from.hostId
}
if h.workload == "" {
    h.workload = from.workload
}
if h.dseVersion == "" {
    h.dseVersion = from.dseVersion
}
if h.partitioner == "" {
    h.partitioner = from.partitioner
}
if h.clusterName == "" {
    h.clusterName = from.clusterName
}
if h.version == (cassVersion{}) {
    h.version = from.version
}
if h.tokens == nil {
    h.tokens = from.tokens
}

}`

When create session, even the passed host structure contains the hostname, when run existingHost.update() in ring.go, the hostname isn't saved in the existingHost. This caused it still use the IP not hostname when connecting Cassandra with TLS and failed with certificate validation as no IP SAN is included in the certificate.

I don't know if it is a design choice or a bug, but the certificate may only contains dnsName. How to deal with this scenario?

If you are having connectivity related issues please share the following additional information

Describe your Cassandra cluster

please provide the following information

martin-sucha commented 2 years ago

Currently the driver expects IP addresses to be provided as initial hosts in the configuration and it discovers additional hosts to add to the ring using system.peers table. See https://pkg.go.dev/github.com/gocql/gocql#hdr-Connecting_to_the_cluster for caveats using DNS name in the initial hosts. The hosts discovered by this process don't have a hostname as cassandra does not provide the hostname.

It seems to me there isn't a single strategy that should be used to validate the certificates. For example, you can have one of the following situations:

So we should probably allow supplying a user-defined function to setup the TLS context used for connection verification to a given host or maybe a dialer interface that would include TLS setup like described in https://github.com/gocql/gocql/pull/1487#issuecomment-715931710 (but that dialer interface seems to be incomplete as we need a dialer for initial connections too).

leerin-ruby commented 2 years ago

@martin-sucha thank you for the response. let's wait for the future enhancement, either the user-deifned function or the dialer interface.

jameshartig commented 2 years ago

@leerin-ruby the hostname you're passing initially, is that hostname present in all of the nodes' certificates? If so, you can modify the clusterCfg.SslOpts and set ServerName on the embedded *tls.Config to hardcode the expected hostname for all of the nodes.

leerin-ruby commented 2 years ago

@jameshartig, yes, the hostname is in all nodes' certificates(actually, I am using containerized cassandra and application), and the serverName is set in the clusterCfg.SslOpts otherwise the verification will fail more earlier.

martin-sucha commented 2 years ago

Hmm, I wonder why it fails with the TLSConfig.ServerName is set. If TLSConfig.ServerName is not empty, gocql shouldn't overwrite it: https://github.com/gocql/gocql/blob/4d42aa3a5f690598a34453ba94f5a379c83f5c94/conn.go#L264

In any case, once https://github.com/gocql/gocql/pull/1629 is merged, it should allow you to customize the setup of the TLS session.

eevans commented 6 months ago

With #1629 merged, should this issue be considered resolved?

leerin-ruby commented 6 months ago

With #1629 merged, should this issue be considered resolved?

I think so. You can resolve.