apache / cassandra-gocql-driver

GoCQL Driver for Apache Cassandra®
https://cassandra.apache.org/
Apache License 2.0
2.59k stars 622 forks source link

Rewrite integration test setup in Go #1686

Open martin-sucha opened 1 year ago

martin-sucha commented 1 year ago

Currently integration tests use ccm to setup a Cassandra cluster for testing.

It is not easy to simulate things like topology changes, nodes joining, leaving, re-joining with a different IP address, etc.

We could rewrite integration test runner using Docker API directly or using a library like testcontainers-go.

ribaraka commented 6 months ago

I would like to work on this issue.

ribaraka commented 5 months ago

When trying to integrate the testcontainers-go, go get github.com/testcontainers/testcontainers-go, I’m getting errors:

../../go/pkg/mod/golang.org/x/crypto@v0.22.0/curve25519/curve25519_go120.go:9:8: cannot find module providing package crypto/ecdh
../../go/pkg/mod/google.golang.org/protobuf@v1.33.0/internal/editiondefaults/defaults.go:9:8: cannot find module providing package embed
../../go/pkg/mod/github.com/testcontainers/testcontainers-go@v0.31.0/container.go:17:2: found packages archive (archive.go) and main (example_changes.go) in /home/stanislav/go/pkg/mod/github.com/docker/docker@v25.0.5+incompatible/pkg/archive
../../go/pkg/mod/github.com/sirupsen/logrus@v1.9.3/terminal_check_unix.go:6:8: found packages unix (affinity_linux.go) and main (mkasm.go) in /home/stanislav/go/pkg/mod/golang.org/x/sys@v0.19.0/unix
../../go/pkg/mod/github.com/testcontainers/testcontainers-go@v0.31.0/docker.go:14:2: cannot find module providing package io/fs
../../go/pkg/mod/github.com/docker/docker@v25.0.5+incompatible/api/types/network/ipam.go:6:2: cannot find module providing package net/netip

The errors indicate issues with dependencies and modules that are required by the testcontainers-go package. One of the errors, cannot find module providing package crypto/ecdh, indicates that the crypto/ecdh package, which is part of the standard library starting from Go 1.20. The testcontainers-go package requires Go version 1.21 or higher, which is not compatible with the current Go version 1.13.

Is the Go version for the gocql project going to be upgraded? If so, I could proceed with integrating testcontainers-go. If no, I could explore alternative approaches or improve current setup of Cassandra Cluster Managament (cmm)?

I am also open to any suggestions for this situation.

martin-sucha commented 5 months ago

It's fine to bump the minimum required Go version to 1.21, we have the following notice in the README:

Sunsetting Model

In general, the gocql team will focus on supporting the current and previous versions of Go. gocql may still work with older versions of Go, but official support for these versions will have been sunset.

ribaraka commented 4 months ago

I've been working on replacing the Cassandra cluster creation mechanism for running integration tests. I have already set up a basic multi-node Cassandra cluster using Testcontainers-go, and many tests pass successfully except for TestDiscoverViaProxy.

If I understand correctly, this test ensures that the driver:

  1. Discovers the entire ring of Cassandra nodes when initially connecting through a proxy.
  2. Avoids storing the proxy address itself in the connection pool, ensuring that only actual Cassandra node addresses are stored.

However, it appears that the proxy address is being stored in the connection pool instead of the actual Cassandra node address. I suspected there might be an issue with my setup and decided to use the existing CCM script. By default, CCM creates nodes on localhost, and each new cluster starts with a node with the IP 127.0.0.1. Consequently, each cluster ends up with hosts [127.0.0.1, 127.0.0.2, 127.0.0.3]. We then set up a proxy server with the address 127.0.0.1. This makes it hard to notice that the proxy server is being stored in the connection pool instead of the first Cassandra node because of the similar IPs (i.e., 127.0.0.1). I changed the IP addresses of the nodes to start from 127.0.0.3 and added some debugging. Here's my revised test case with debugging:

go
func TestDiscoverViaProxy(t *testing.T) {
    // This (complicated) test tests that when the driver is given an initial host
    // that is infact a proxy it discovers the rest of the ring behind the proxy
    // and does not store the proxies address as a host in its connection pool.
    // See <https://github.com/gocql/gocql/issues/481>
    clusterHosts := getClusterHosts()
    proxy, err := net.Listen("tcp", "localhost:0")

    if err != nil {
        t.Fatalf("unable to create proxy listener: %v", err)
    }
    ctx, cancel := context.WithCancel(context.Background())
    defer cancel()

    var (
        mu         sync.Mutex
        proxyConns []net.Conn
        closed     bool
    )

    go func() {
        cassandraAddr := JoinHostPort(clusterHosts[0], 9042)
        t.Log("cassandraAddr is ", cassandraAddr)

        cassandra := func() (net.Conn, error) {
            return net.Dial("tcp", cassandraAddr)
        }

        proxyFn := func(errs chan error, from, to net.Conn) {
            _, err := io.Copy(to, from)
            if err != nil {
                errs <- err
            }
        }

        // handle dials cassandra and then proxies requests and reponsess. It waits
        // for both the read and write side of the TCP connection to close before
        // returning.
        handle := func(conn net.Conn) error {
            cass, err := cassandra()
            if err != nil {
                return err
            }
            defer cass.Close()

            errs := make(chan error, 2)
            go proxyFn(errs, conn, cass)
            go proxyFn(errs, cass, conn)

            select {
            case <-ctx.Done():
                return ctx.Err()
            case err := <-errs:
                return err
            }
        }

        for {
            // proxy just accepts connections and then proxies them to cassandra,
            // it runs until it is closed.
            conn, err := proxy.Accept()
            if err != nil {
                mu.Lock()
                if !closed {
                    t.Error(err)
                }
                mu.Unlock()
                return
            }

            mu.Lock()
            proxyConns = append(proxyConns, conn)
            mu.Unlock()

            go func(conn net.Conn) {
                defer conn.Close()

                if err := handle(conn); err != nil {
                    mu.Lock()
                    if !closed {
                        t.Error(err)
                    }
                    mu.Unlock()
                }
            }(conn)
        }
    }()

    proxyAddr := proxy.Addr().String()
    t.Log("proxyAddr is ", proxyAddr)

    cluster := createCluster()
    cluster.NumConns = 1
    // initial host is the proxy address
    cluster.Hosts = []string{proxyAddr}

    session := createSessionFromCluster(cluster, t)
    defer session.Close()

    // we shouldnt need this but to be safe
    time.Sleep(1 * time.Second)

    session.pool.mu.RLock()

    t.Log("clusterHosts are ", clusterHosts)

    for _, host := range clusterHosts {
        found := false
        for _, hi := range session.pool.hostConnPools {
            t.Log("host is ", host)
            t.Log("hostConnPool is ", hi.host)
            if hi.host.RPCAddress().String() == host {
                found = true
                break
            }
        }

        if !found {
            t.Errorf("missing host in pool after discovery: %q", host)
        }
    }
    session.pool.mu.RUnlock()

    mu.Lock()
    closed = true
    if err := proxy.Close(); err != nil {
        t.Log(err)
    }

    for _, conn := range proxyConns {
        if err := conn.Close(); err != nil {
            t.Log(err)
        }
    }
    mu.Unlock()
}

Here’s the output:

=== RUN   TestDiscoverViaProxy
    cassandra_test.go:3001: proxyAddr is  127.0.0.1:34435
    cassandra_test.go:2934: cassandraAddr is  127.0.0.31:9042
    cassandra_test.go:3016: clusterHosts are  [127.0.0.31 127.0.0.32 127.0.0.33]
    cassandra_test.go:3021: host is  127.0.0.31
    cassandra_test.go:3022: hostConnPool is  [HostInfo hostname="127.0.0.32" connectAddress="127.0.0.32" peer="127.0.0.32" rpc_address="127.0.0.32" broadcast_address="<nil>" preferred_ip="<nil>" connect_addr="127.0.0.32" connect_addr_source="connect_address" port=9042 data_centre="datacenter1" rack="rack1" host_id="0c59d814-4235-400b-be07-41bf2934590f" version="v4.0.8" state=UP num_tokens=1]
    cassandra_test.go:3021: host is  127.0.0.31
    cassandra_test.go:3022: hostConnPool is  [HostInfo hostname="127.0.0.33" connectAddress="127.0.0.33" peer="127.0.0.33" rpc_address="127.0.0.33" broadcast_address="<nil>" preferred_ip="<nil>" connect_addr="127.0.0.33" connect_addr_source="connect_address" port=9042 data_centre="datacenter1" rack="rack1" host_id="d614ea24-0708-4b8d-998b-51e374d711ed" version="v4.0.8" state=UP num_tokens=1]
    cassandra_test.go:3021: host is  127.0.0.31
    cassandra_test.go:3022: hostConnPool is  [HostInfo hostname="127.0.0.1" connectAddress="127.0.0.1" peer="<nil>" rpc_address="127.0.0.31" broadcast_address="127.0.0.31" preferred_ip="<nil>" connect_addr="127.0.0.1" connect_addr_source="connect_address" port=34435 data_centre="datacenter1" rack="rack1" host_id="02ec5451-6627-488a-a2a2-8c9b54f822e3" version="v4.0.8" state=UP num_tokens=1]
    cassandra_test.go:3030: missing host in pool after discovery: "127.0.0.31"
    cassandra_test.go:3021: host is  127.0.0.32
    cassandra_test.go:3022: hostConnPool is  [HostInfo hostname="127.0.0.33" connectAddress="127.0.0.33" peer="127.0.0.33" rpc_address="127.0.0.33" broadcast_address="<nil>" preferred_ip="<nil>" connect_addr="127.0.0.33" connect_addr_source="connect_address" port=9042 data_centre="datacenter1" rack="rack1" host_id="d614ea24-0708-4b8d-998b-51e374d711ed" version="v4.0.8" state=UP num_tokens=1]
    cassandra_test.go:3021: host is  127.0.0.32
    cassandra_test.go:3022: hostConnPool is  [HostInfo hostname="127.0.0.1" connectAddress="127.0.0.1" peer="<nil>" rpc_address="127.0.0.31" broadcast_address="127.0.0.31" preferred_ip="<nil>" connect_addr="127.0.0.1" connect_addr_source="connect_address" port=34435 data_centre="datacenter1" rack="rack1" host_id="02ec5451-6627-488a-a2a2-8c9b54f822e3" version="v4.0.8" state=UP num_tokens=1]
    cassandra_test.go:3021: host is  127.0.0.32
    cassandra_test.go:3022: hostConnPool is  [HostInfo hostname="127.0.0.32" connectAddress="127.0.0.32" peer="127.0.0.32" rpc_address="127.0.0.32" broadcast_address="<nil>" preferred_ip="<nil>" connect_addr="127.0.0.32" connect_addr_source="connect_address" port=9042 data_centre="datacenter1" rack="rack1" host_id="0c59d814-4235-400b-be07-41bf2934590f" version="v4.0.8" state=UP num_tokens=1]
    cassandra_test.go:3021: host is  127.0.0.33
    cassandra_test.go:3022: hostConnPool is  [HostInfo hostname="127.0.0.33" connectAddress="127.0.0.33" peer="127.0.0.33" rpc_address="127.0.0.33" broadcast_address="<nil>" preferred_ip="<nil>" connect_addr="127.0.0.33" connect_addr_source="connect_address" port=9042 data_centre="datacenter1" rack="rack1" host_id="d614ea24-0708-4b8d-998b-51e374d711ed" version="v4.0.8" state=UP num_tokens=1]
--- FAIL: TestDiscoverViaProxy (4.98s)

The output shows that the proxy address (127.0.0.1:34435) is being stored as a host in the connection pool. The actual Cassandra node (127.0.0.31) is missing from the connection pool after discovery. Logs indicate that the proxy address is listed among the hostConnPools. They also confirm that the test fails because the proxy address is present, and the actual node (127.0.0.31) is missing.

I noticed from the logs that an RPC address of the replaced node is identified as 127.0.0.31, which is the original Cassandra node address. The RPC (Remote Procedure Call) address is the network address used by Cassandra nodes to communicate with clients. It is the address that clients use to send queries, retrieve data, and perform other operations. So I assume that the gocql driver appears to be working correctly by using the RPC address for internal communications. In this case, I believe the test case should compare the RPCAddress instead of the ConnectAddress to verify the discovered hosts (i.e., hi.host.RPCAddress().String() == host instead of ConnectAddress). Could you please confirm if this is correct, as I don't have much experience with this yet.

When I run this test using my cluster setup with testcontainers-go, I encounter an interesting moment: Cassandra version 4.0.8 fails to resolve the correct RPC address, whereas versions 4.1.1 and above operate correctly. So far, I have discovered that in the Cassandra 4.0.x series, the RPC addresses are not being discovered by the gocql driver. This is likely a bug or limitation in the earlier versions of Cassandra, leading to the proxy address being stored in the connection pool with an incorrect RPC address (0.0.0.0). As far as I read, setting the rpc_address to the node's specific address is generally acceptable and often recommended for production environments. It provides more control and avoids potential issues related to binding to all interfaces. Binding to a specific address reduces the risk of exposing the RPC service to unintended networks, enhancing security. It ensures that Cassandra binds to the correct network interface, avoiding issues where multiple interfaces might cause confusion.

By the way, the CCM script sets the RPC address as the node IP. So I manually added the rpc addresses to each cassandra node in my setup implementation as well. Could you please confirm my approach to the TestDiscoverViaProxy test and the resolution strategy I outlined? I would greatly appreciate your insights on these points.

Also, I have added a draft PR with the testcontainers-go runner. It is nearly complete, pending the test case I discribed above, removal of all CCM relations, adding the authentication test case, and perhaps some minor adjustments. Please review it, and let me know how the code looks so far. ^ @martin-sucha @Zariel

ribaraka commented 3 months ago

I have finalized the PR, so please take a look.

While working on it, I encountered an error message:

Error: ../../../go/pkg/mod/github.com/docker/docker@v25.0.5+incompatible/pkg/archive/archive.go:783:24: undefined: strings.CutPrefix
Error: Process completed with exit code 1.

The error, undefined: strings.CutPrefix, indicates that the code is using the strings.CutPrefix function, which was introduced in Go 1.20. Currently, the Go version is set to 1.19 in the GitHub Actions workflow, so this function is not available. To fix this, I have updated the GitHub Actions workflow to ensure that the Go version is set to 1.20 or higher, removing 1.19 from the strategy matrix. Additionally, we should update the project description to reflect that Go 1.19 is no longer supported under the Supported Versions section. ^ @martin-sucha

joao-r-reis commented 1 month ago

Restarting this discussion over here but feel free to comment here if you can't comment on the JIRA